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HAEMOPHILUS ADHERENCE AND PENETRATION PROTEINS 

FIELD OF THE INVENTION 

The invention relates to Haemophilus adhesion and 
penetration proteins, nucleic acids, and vaccines. 

BACKGROUND OF THE INVENTION 

Most bacterial diseases begin with colonization of a 
particular mucosal surface (Beachey et al., 1981, J. 
Infect. Dis. 143:325-345). Successful colonization 
requires that an organism overcome mechanical cleansing 
of the mucosal surface and evade the local immune 
response. The process of colonization is dependent upon 
specialized microbial factors that promote binding to 
host cells (Hultgren et al . , 1993 Cell, 73:887-901). 
In some cases the colonizing organism will subsequently 
enter (invade) these cells and survive intracellularly 
(Falkow, 1991, Cell 65:1099-1102). 

Haemophilus influenzae is a common commensal organism 
of the human respiratory tract (Kuklinska and Kilian, 
1984, Eur. J. Clin. Microbiol. 3:249-252). It is a 
human- specific organism that normally resides in the 
human nasopharynx and must colonize this site in order 
to avoid extinction. This microbe has a number of 
surface structures capable of promoting attachment to 
host cells (Guerina et al., 1982, J. Infect. Dis. 
146:564; Pichichero et al . , 1982, Lancet ii: 960-962; St. 
Geme et al . , 1993, Proc. Natl. Acad. Sci . U.S.A. 
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90-2875-2879) . In addition, H. influenzae has acquired 
h e 2 caUi ty to enter and survive within these ce s 
- =,i n994 Infect. Immun. 62:673-679, 
rZTwt "l99 Infect. Immun. S8 ,4036-4044 ; St- 
r Fallow 1991. infect. Immun. 59,1325-1333. 

5 »7ec 1^ '59,3366-3371). 

^ I™ is an important cause o £ both localized 
respiratory tract and systemic disease (Turk, 1984 J. 
Ted MicroMoi. 18,1-16). .-encapsulated 
strains account for the majority of local disease (Turk. 
1,84 supra); in contrast, serotype b strains, which 
1984, supra;. no lvmer of ribose and 

express a capsule composed of a P 01 ^ 
ri bitol-5-phosphate (PRP) . are responsible or over 95% 
of cases of >. influenzae systemic disease (Turk^ 1962 
1S Clinical importance of Haemophilus influenzae, p. 3-9. 

" 8H Sell and P.F. Wright (ed.>, Haemophilus 
influenzae epidemiology, immunolc^ and preventron of 
disease. Elsevier/North-Holland Publishing Co.. New 
York) . 

2 „ The initial step in the pathogenesis of disease due to 
H influenzae involves colonization of the upper 
respiratory mucosa (murphy et al . . 1987, a. Infect Dis. 
5 723-731^ colonization with a particular strain may 
persist for weeks to months, and most individuals remain 
25 asymptomatic throughout this period (Spinola et e! 

1986 I infect. Die. 154:100-109). However, in certain 
^stances colonization will be followed * 
contiguous spread within the ^£7* 
resulting in local disease in the middle ear, the 
30 sinuses, the coniunctiva. or the lungs . ^«™"~^ 
on occasion bacteria will penetrate the nasopharyngeal 
epithelial barrier and enter the bloodstream. 
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In vitro observations and animal studies suggest that 
bacterial surface appendages called pili (or fimbriae) 
play an important role in H. influenzae colonization. 
In 1982 two groups reported a correlation between 
5 piliation and increased attachment to human 

oropharyngeal epithelial cells and erythrocytes (Guerina 
et al., supra; Pichichero et al . , supra). Other 
investigators have demonstrated that anti-pilus 
antibodies block in vitro attachment by piliated H. 

10 influenzae (Forney et al . , 1992, J. Infect. Dis. 

165:464-470; van Alphen et al . , 1988, Infect. Immun. 
56:1800-1806). Recently Weber et al . insertionally 
inactivated the pilus structural gene in an H. 
influenzae type b strain and thereby eliminated 

15 expression of pili; the resulting mutant exhibited a 

reduced capacity for colonization of year-old monkeys 
(Weber et al . t 1991, Infect. Immun. 59:4724-4728). 

A number of reports suggest that nonpilus factors also 
facilitate Haemophilus colonization. Using the human 

20 nasopharyngeal organ culture model, Farley et al . (1986, 

J. Infect. Dis. 161:274-280) and Loeb et al . (1988, 
Infect. Immun. 49:484-489) noted that nonpiliated type 
b strains were capable of mucosal attachment. Read and 
coworkers made similar observations upon examining 

25 nontypable strains in a model that employs nasal 

turbinate tissue in organ culture (1991, J. Infect. Dis. 
163 : 54 9-558) . In the monkey colonization study by Weber 
et al . (1991, supra), nonpiliated organisms retained a 
capacity for colonization, though at reduced densities; 

3 0 moreover, among monkeys originally infected with the 

piliated strain, virtually all organisms recovered from 
the nasopharynx were nonpiliated. All of these 
observations are consistent with the finding that 
nasopharyngeal isolates from children colonized with if. 
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influenzae are frequently nonpiliated (Mason et al 
1985, infect. Immun. 49:98-103; Brinton et al., 1989, 
Pediatr. Infect. Dis. J. 8:554-561). 

Previous studies have shown that H. influenzae are 
capable of entering (invading) cultured human epithelial 
cells via a pili- independent mechanism (St. Geme and 
Falkow, 1990, supra; St. Geme and Falkow, 1991, supra) . 
Although H. influenzae is not generally considered an 
intracellular parasite, a recent report suggests that 
these in vitro findings may have an in vivo correlate 
(Forsgren et al . , 1994, supra) . Forsgren and coworkers 
examined adenoids from 10 children who had their 
adenoids removed because of longstanding secretory 
otitis media or adenoidal hypertrophy. In all 10 cases 
there were viable intracellular H. influenzae. Electron 
microscopy demonstrated that these organisms were 
concentrated in the reticular crypt epithelium and in 
macrophage-like cells in the subepithelial layer of 
tissue. One possibility is that bacterial entry into 
host cells provides a mechanism for evasion of the local 
immune response, thereby allowing persistence in the 
respiratory tract* 

Thus a vaccine for the therapeutic and prophylactic 
treatment of Haemophilus infection is desirable. 
Accordingly, it is an object of the present invention 
to provide for recombinant Haemophilus Adherence and 
Penetration (HAP) proteins and variants thereof, and to 
produce useful quantities of these HAP proteins using 
recombinant DNA techniques. 

It is a further object of the invention to provide 
recombinant nucleic acids encoding HAP proteins, and 
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expression vectors and host cells containing the nucleic 
acid encoding the HAP protein. 

An additional object of the invention is to provide 
monoclonal antibodies for the diagnosis of Haemophilus 
infection. 

A further object of the invention is to provide methods 
for producing the HAP proteins, and a vaccine comprising 
the HAP proteins of the present invention. Methods for 
the therapeutic and prophylactic treatment of 
Haemophilus infection are also provided. 

SUMMARY OF THE INVENTION 

In accordance with the foregoing objects, the present 
invention provides recombinant HAP proteins, and 
isolated or recombinant nucleic acids which encode the 
HAP proteins of the present invention. Also provided 
are expression vectors which comprise DNA encoding a HAP 
protein operably linked to transcriptional and 
translational regulatory DNA, and host cells which 
contain the expression vectors . 

The invention provides also provides methods for 
producing HAP proteins which comprises culturing a host 
cell transformed with an expression vector and causing 
expression of the nucleic acid encoding the HAP protein 
to produce a recombinant HAP protein. 

The invention also includes vaccines for Haemophilus 
influenzae infection comprising an HAP protein for 
prophylactic or therapeutic use in generating an immune 
response in a patient. Methods of treating or 
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preventing Haemophilus influenzae infection comprise 
administering a vaccine. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1A and IB depict light micrographs of H. 
influenzae strains DB117 (pGJB103) and DB117(pN187) 
incubated with Chang epithelial cells. Bacteria were 
incubated with an epithelial monolayer for 30 minutes 
before rinsing and straining with Giemsa stain. Figure 
1A: H. influenzae strain DB117 carrying cloning vector 
alone (pGJB103) ; Figure IB: H. influenzae strain DB117 
harboring recombinant plasmid P H187 . Bar represents 3 . 5 
/xm. 

Figures 2A. 2B, 2C and 2D depict thin section 
transmission electron micrographs demonstrating 
interaction between H. influenzae strains N187 and 
DB117 ( P N187) with Chang epithelial cells . Bacteria were 
incubated with epithelial monolayers for four hours 
before rinsing and processing for examination by 
transmission electron microscopy. Figure 2A: strain 
N187 associated with the epithelial cell surface and 
present in an intracellular location; Figure 2B: H. 
influenzae DB117 (pH187) in intimate contact with the 
epithelial cell surface; Figure 2C: strain DB117(pN187) 
in the process of entering an epithelial cell; Figure 
2D: strain DB117(pN187) present in an intracellular 
location. Bar represents 1 M m - 

Figure 3 depicts outer membrane protein profiles of 
various strains. Outer membrane proteins were isolated 
on the basis of sarcosyl insolubility and resolved on 
a 10% SDS-polyacrylamide gel. Proteins were visualized 
by staining with Coomassie blue. Lane 1, H. influenzae 
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strain DB117 (pGJB103) ; lane 2, strain DB117 (pN187) ; lane 
3, strain DB117 (pJS106) ; lane 4, E. coli HB101 (pGJB103 ) ; 
lane 5, HB101 (pN187) . Note novel proteins at -160 kD 
and 45 kD marked by asterisks in lanes 2 and 3 . 

5 Figure 4 depicts a restriction map of pN187 and 

derivatives and locations of mini-TnlO Jean insertions. 
pN187 is a derivative of pGJB103 that contains an 8.5-kb 
Sau3AI fragment of chromosomal DNA from H. influenzae 
strain N187. Vector sequences are represented by 

10 hatched boxes. Letters above top horizontal line 

indicate restriction enzyme sites: Bg, Bglll; C, Clal; 
E, EcoRI ; P, Pstl. Numbers and lollipops above top 
horizontal line show positions of mini-TnlO kan 
insertions; open lollipops represent insertions that 

15 have no effect on adherence and invasion, while closed 

lollipops indicate insertions that eliminate the 
capacity of pN187 to promote association with epithelial 
monolayers. Heavy horizontal line with arrow represents 
location of hap locus within pN187 and direction of 

2 0 transcription. ( + ) : recombinant plasmids that promote 

adherence and invasion; {-) : recombinant plasmids that 
fail to promote adherence and invasion. 

Figure 5 depicts the identification of plasmid-encoded 
proteins using the bacteriophage T7 expression system. 

25 Bacteria were radiolabeled with [ 35 S] methionine, and 

whole cell lysates were resolved on a 10% SDS- 
polyacrylamide gel. Proteins were visualized by 
autoradiography. Lane 1, E. coli XL-1 Blue(pT7-7) 
uninduced; lane 2, XL-1 Blue(pT7-7) induced with IPTG; 

30 lane 3, XL-1 Blue(pJS103) uninduced; lane 4, XL-1 

Blue(pJS103) induced with IPTG; lane 5, XL-1 
Blue(pJS104) uninduced; lane 6, XL-1 Blue(pJS104) 
induced with IPTG. The plasmids pJS103 and pJS104 are 
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derivatives of pT7-7 that contain the 6.5-kb PstI 
fragment from pN187 in opposite orientations . Asterisk 
indicates overexpressed protein in XL-1 Blue (pJS104 ) . 

Figures 6A, 6B, and 6C depict the nucleotide sequence 
5 and predicted amino acid sequence of hap gene. Putative 
-10 and -35 sequences 5' to the hap coding sequence are 
underlined; a putative rho- independent terminator 3' to 
the hap stop codon is indicated with inverted arrows. 
The first 25 amino acids of the protein, which are 
10 boxed, represent the signal sequence. 

Figures 7A, 7B, 7C, 7D, 7E, 7F, 7G, and 7H depict a 
sequence comparison of the hap product and the cloned 
H. influenzae IgAl proteases. Amino acid homologies 
between the deduced hap gene product and the iga gene 

15 products from H. influenzae HK368, HK61, HK393, and 

HK793 are shown. Dashes indicate gaps introduced in the 
sequences in order to obtain maximal homology. A 
consensus sequence for the five proteins is shown on the 
lower line. The conserved serine- type protease 

20 catalytic domain is underlined, and the common active 
site serine is denoted by an asterisk. The conserved 
cysteines are also indicated by asterisks. 

Figure 8 depicts the IgAl protease activity assay. 
Culture supematants were assayed for the ability to 
25 cleave IgAl. Reaction mixtures were resolved on a 10% 

SDS-polyacrylamide gel and then transferred to a 
nitrocellulose membrane. The membrane was probed with 
antibody against human IgAl heavy chain. Lane 1, H. 
influenzae strain N187; lane 2, strain DB117 (pGJB103 ) ; 
lane 3, strain DB117 (pN187) . The cleavage product 
patterns suggest that strain N187 contains a type 2 IgAl 
protease while strains DB117 (pGJB103) and DB117(pN187) 



30 
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contain a type 1 enzyme. The upper band of ~70-kD seen 
with the DB117 derivatives represents intact IgAl heavy 
chain. 

Figures 9A and 9B depict southern analysis of 
5 chromosomal DNA from strain H. influenzae N187, probing 

with hap versus iga. DNA fragments were separated on 
a 0.7% agarose gel and transferred bidirectionally to 
nitrocellulose membranes prior to probing with either 
hap or iga. Lane 1, N187 chromosomal DNA digested with 

10 EcoRI; lane 2, N187 chromosomal DNA digested with BglU ; 

lane 3, N187 chromosomal DNA digested with BamHI ; lane 
4, the 4.8-kb ClaX-PstI fragment from pN187 that 
contains the intact hap gene . Figure 9A: Hybridization 
with the 4.8-kb Clal-PstI fragment containing the hap 

15 gene; Figure 9B : hybridization with the iga gene from 

H. influenzae strain Rd, carried as a 4.8-kb Clal-EcoRI 
fragment in pVDH6 . 

Figure 10 depicts a SDS -polyacrylamide gel of secreted 
proteins. Bacteria were grown to late log phase, and 

2 0 culture supernatants were precipitated with 

trichloroacetic acid and then resolved on a 10% SDS- 
polyacrylamide gel. Proteins were visualized by 
staining with Coomassie blue. Lane 1, H. influenzae 
strain DB117 (pGJB103 ) ; lane 2, DB117 (pN187) ; lane 3, 
25 DB117(pJS106) ; lane 4, DB117 (pJS102) ; lane 5, 

DB117 (pJS105) ; lane 6, DB117 (Tnl0-18) ; lane 7, 
DB117(Tnl0-4' ) ; lane 8, DB117 (Tnl0-30) ; lane 9, 
DB117(Tnl0-16) ; lane 10, DB117 (Tnl0-10) ; lane II, 
DB117 (Tnl0-8) ; lane 12, N187. Asterisk indicates 110-kD 

3 0 secreted protein encoded by hap. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel Haemophilus 
Adhesion and Penetration (HAP) proteins. In a preferred 
embodiment, the HAP proteins are from Haemophilus 
5 strains, and in the preferred embodiment, from 

Haemophilus influenza. However, using the techniques 
outlined below, HAP proteins from other Haemophilus 
influenzae strains, or from other bacterial species such 
as Neisseria spp. or Bordetalla spp . may also be 
10 obtained. 

A HAP protein may be identified in several ways. A HAP 
nucleic acid or HAP protein is initially identified by 
substantial nucleic acid and/or amino acid sequence 
homology to the sequences shown in Figure 6. Such 
15 homology can be based upon the overall nucleic acid or 

amino acid sequence. 

The HAP proteins of the present invention have limited 
homology to Haemophilus influenzae and N. gonorrhoeae 
serine-type IgAl proteases. This homology, shown in 

20 Figure 7, is approximately 30-35% at the amino acid 

level, with several stretches showing 55-60% identity, 
including amino acids 457-549, 399-466, 572-622, and 
233-261. However, the homology between the HAP protein 
and the IgAl protease is considerably lower than the 

25 similarity among the IgAl proteases themselves. 

In addition, the full length HAP protein has homology 
to Tsh, a hemagglutinin expressed by an avian E. coli 
strain (Provence and Curtiss 1994, Infect. Immun. 
62:1369-1380) • The homology is greatest in the N- 
3 0 terminal half of the proteins, and the overall homology 

is 30.5% homologous. The full length HAP protein also 
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has homology with pertactin, a 69 kD outer membrane 
protein expressed by B. pertussis, with the middle 
portion of the proteins showing 3 9% homology. Finally, 
HAP has 34 - 52% homology with six regions of HpmA, a 
5 calcium- independent hemolysin expressed by Proteus 

mirabilis (Uphoff and Welch, 1990, J. Bacteriol . 
172:1206-1216) . 

As used herein, a protein is a "HAP protein" if the 
overall homology of the protein sequence to the amino 
acid sequence shown in Figure 6 is preferably greater 
than about 40 - 50%, more preferably greater than about 
60% and most preferably greater than 80%. In some 
embodiments the homology will be as high as about 90 to 
95 or 98%. This homology will be determined using 
standard techniques known in the art, such as the Best 
Fit sequence program described by Devereux et al . , Nucl . 
Acid Res. 12:387-395 (1984) . The alignment may include 
the introduction of gaps in the sequences to be aligned. 
In addition, for sequences which contain either more or 
fewer amino acids than the protein shown in Figure 6, 
it is understood that the percentage of homology will 
be determined based on the number of homologous amino 
acids in relation to the total number of amino acids. 
Thus, for example, homology of sequences shorter than 
that shown in Figure 6, as discussed below, will be 
determined using the number of amino acids in the 
shorter sequence . 

HAP proteins of the present invention may be shorter 
than the amino acid sequence shown in Figure 6 . As 
30 shown in the Examples, the HAP protein may undergo post- 

translational processing similar to that seen for the 
serine- type IgAl proteases expressed by Haemophilus 
influenzae and N. gonorrhoeae. These proteases are 
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synthesized as preproteins with three functional 
domains: the N-terminal signal peptide, the protease, 
and a C- terminal helper domain. Following movement of 
these proteins into the periplasmic space, the carboxy 
terminal S-domain of the proenzyme is inserted into the 
outer membrane, possibly forming a pore (Poulsen et al . , 
1989, Infect. Immun. 57:3097-3105; Pohlner et al . , 1987, 
Nature (London). 325:458-462; Klauser et al . , 1992, 
EMBO J. 11:2327-2335; Klauser et al . , 1993, J. Mol . 
Biol. 234:579-593) . Subsequent ly the amino end of the 
protein is exported through the outer membrane, and 
autoproteolytic cleavage occurs to result in secretion 
of the mature 100 to 106 -kD protease. The 4 5 to 56 -kD 
C-terminal S-domain remains associated with the outer 
membrane following the cleavage event. As shown in the 
Examples, the HAP nucleic acid is associated with 
expression of a 160 kD outer membrane protein. The 
secreted gene product is an approximately 110 kD 
protein, with the simultaneous appearance of a 45 kD 
outer membrane protein. The 4 5 kD protein appears to 
correspond to amino acids from about 960 to about 1394 
of Figure 6. Any one of these proteins is considered 
a HAP protein for the purposes of this invention. 

Thus, in a preferred embodiment, included within the 
def intion of HAP proteins are portions or fragments of 
the sequence shown in Figure 6. The fragments may be 
fragments of the entire sequence, the 110 kD sequence, 
or the 45 kD sequence. Generally, the HAP protein 
fragments may range in size from about 10 amino acids 
to about 1900 amino acids, with from about 50 to about 
1000 amino acids being preferred, and from about 100 to 
about 500 amino acids also preferred. Particularly 
preferred fragments are sequences unique to HAP; these 
sequences have particular use in cloning HAP proteins 
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from other organisms or to generate antibodies specific 
to HAP proteins. Unique sequences are easily identified 
by those skilled in the art after examination of the HAP 
protein sequence and comparison to other proteins; for 

5 example, by examination of the sequence alignment shown 

in Figure 7. For instance, as compared to the IgA 
proteases, unique sequences include, but are not limited 
to, amino acids 11-14, 16-22, 108-120, 155-164, 257-265, 
281-288, 318-336, 345-353, 398-416, 684-693, 712-718, 

0 753-761, 871-913, 935-953, 985-1008, 1023-1034, 1067- 

1076, 1440-1048, 1585-1592, 1631-1639, 1637-1648, 1735- 
1743, 1863-1871, 1882-1891, 1929-1941, and 1958-1966 
(using the numbering of Figure 7) . HAP protein 
fragments which are included within the definition of 

5 a HAP protein include N- or C- terminal truncations and 

deletions which still allow the protein to be 
biologically active; for example, which still exhibit 
proteolytic activity in the case of the 110 kD putative 
protease sequence. In addition, when the HAP protein 

0 is to be used to generate antibodies, for example as a 

vaccine, the HAP protein must share at least one epitope 
or determinant with either the full length protein, the 
110 kD protein or the 45 kD protein, shown in Figure 6. 
In a preferred embodiment, the epitope is unique to the 

5 HAP protein; that is, antibodies generated to a unique 

epitope exhibit little or no cross-reactivity with other 
proteins. By "epitope" or "determinant" herein is meant 
a portion of a protein which will generate and/or bind 
an antibody. Thus, in most instances, antibodies made 

0 to a smaller HAP protein will be able to bind to the 

full length protein. 

In some embodiments, the fragment of the HAP protein 
used to generate antibodies are small; thus, they may 
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be used as haptens and coupled to protein carriers to 
generate antibodies, as is known in the art. 

Preferably, the antibodies are generated to a portion 
of the HAP protein which remains attached to the 
Haemophilus influenzae organism. For example, the HAP 
protein can be used to vaccinate a patient to produce 
antibodies which upon exposure to the Haemophilus 
influenzae organism (e.g. during a subsequent infection) 
bind to the organism and allow an immune response. 
Thus, in one embodiment, the antibodies are generated 
to the roughly 4 5 kD fragment of the full length HAP 
protein. Preferably, the antibodies are generated to 
the portion of the 45 kD fragment which is exposed at 
the outer membrane. 

In an alternative embodiment, the antibodies bind to the 
mature secreted 110 kD fragment. For example, as 
explained in detail below, the HAP proteins of the 
present invention may be administered therapeutically 
to generate neutralizing antibodies to the 110 kD 
putative protease, to decrease the undesirable effects 
of the 100 kD fragment. 



In the case of the nucleic acid, the overall homology 
of the nucleic acid sequence is commensurate with amino 
acid homology but takes into account the degeneracy in 

25 the genetic code and codon bias of different organisms. 

Accordingly, the nucleic acid sequence homology may be 
either lower or higher than that of the protein 
sequence. Thus the homology of the nucleic acid 
sequence as compared to the nucleic acid sequence of 

30 Figure 6 is preferably greater than 40%, more preferably 

greater than about 60% and most preferably greater than 
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80%. In some embodiments the homology will be as high 
as about 90 to 95 or 98%. 

In one embodiment, the nucleic acid homology is 
determined through hybridization studies. Thus, for 
example, nucleic acids which hybridize under high 
stringency to all or part of the nucleic acid sequence 
shown in Figure 6 are considered HAP protein genes. 
High stringency conditions include washes with 0 . 1XSSC 
at 65°C for 2 hours. 

The HAP proteins and nucleic acids of the present 
invention are preferably recombinant. As used herein, 
"nucleic acid" may refer to either DNA or RNA, or 
molecules which contain both deoxy- and ribonucleotides. 
The nucleic acids include genomic DNA, cDNA and 
oligonucleotides including sense and anti -sense nucleic 
acids. Specifically included within the definition of 
nucleic acid are anti-sense nucleic acids. An anti- 
sense nucleic acid will hybridize to the corresponding 
non-coding strand of the nucleic acid sequence shown in 
Figure 6, but may contain ribonucleotides as well as 
deoxyribonucleotides . Generally, anti -sense nucleic 
acids function to prevent expression of mRNA, such that 
a HAP protein is not made, or made at reduced levels. 
The nucleic acid may be double stranded, single 
stranded, or contain portions of both double stranded 
or single stranded sequence. By the term "recombinant 
nucleic acid" herein is meant nucleic acid, originally 
formed in vitro by the manipulation of nucleic acid by 
endonucleases , in a form not normally found in nature. 
Thus an isolated HAP protein gene, in a linear form, or 
an expression vector formed in vitro by ligating DNA 
molecules that are not normally joined, are both 
considered recombinant for the purposes of this 
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invention. It is understood that once a recombinant 
nucleic acid is made and reintroduced into a host cell 
or organism, it will replicate non- recombinant ly, i.e. 
using the in vivo cellular machinery of the host cell 
rather than in vitro manipulations; however, such 
nucleic acids, once produced recombinant ly, although 
subsequently replicated non- recombinant ly, are still 
considered recombinant for the purposes of the 
invention . 

Similarly, a "recombinant protein" is a protein made 
using recombinant techniques, i.e. through the 
expression of a recombinant nucleic acid as depicted 
above. A recombinant protein is distinguished from 
naturally occurring protein by at least one or more 
characteristics. For example, the protein may be 
isolated away from some or all of the proteins and 
compounds with which it is normally associated in its 
wild type host, or found in the absence of the host 
cells themselves. Thus, the protein may be partially 
or substantially purified. The definition includes the 
production of a HAP protein from one organism in a 
different organism or host cell. Alternatively, the 
protein may be made at a significantly higher 
concentration than is normally seen, through the use of 
a inducible promoter or high expression promoter, such 
that the protein is made at increased concentration 
levels. Alternatively, the protein may be in a form not 
normally found in nature, as in the addition of an 
epitope tag or amino acid substitutions, insertions and 
deletions . 

Also included with the definition of HAP protein are HAP 
proteins from other organisms, which are cloned and 
expressed as outlined below. 
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In the case of anti-sense nucleic acids, an anti-sense 
nucleic acid is defined as one which will hybridize to 
all or part of the corresponding non-coding sequence of 
the sequence shown in Figure 6. Generally, the 
hybridization conditions used for the determination of 
anti-sense hybridization will be high stringency 
conditions, such as 0 . 1XSSC at 65°C. 

Once the HAP protein nucleic acid is identified, it can 
be cloned and, if necessary, its constituent parts 
recombined to form the entire HAP protein nucleic acid. 
Once isolated from its natural source, e.g., contained 
within a plasmid or other vector or excised therefrom 
as a linear nucleic acid segment, the recombinant HAP 
protein nucleic acid can be further used as a probe to 
identify and isolate other HAP protein nucleic acids. 
It can also be used as a "precursor 11 nucleic acid to 
make modified or variant HAP protein nucleic acids and 
proteins . 

Using the nucleic acids of the present invention which 
encode HAP protein, a variety of expression vectors are 
made. The expression vectors may be either self- 
replicating extrachromosomal vectors or vectors which 
integrate into a host genome. Generally, these 
expression vectors include transcriptional and 
translational regulatory nucleic acid operably linked 
to the nucleic acid encoding the HAP protein. "Operably 
linked" in this context means that the transcriptional 
and translational regulatory DNA is positioned relative 
to the coding sequence of the HAP protein in such a 
manner that transcription is initiated. Generally, this 
will mean that the promoter and transcriptional 
initiation or start sequences are positioned 5' to the 
HAP protein coding region. The transcriptional and 
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translational regulatory nucleic acid will generally be 
appropriate to the host cell used to express the HAP 
protein; for example, transcriptional and translational 
regulatory nucleic acid sequences from Bacillus will be 
5 used to express the HAP protein in Bacillus . Numerous 

types of appropriate expression vectors, and suitable 
regulatory sequences are known in the art for a variety 
of host cells. 

In general, the transcriptional and translational 
10 regulatory sequences may include, but are not limited 

to, promoter sequences, leader or signal sequences, 
ribosomal binding sites, transcriptional start and stop 
sequences, translational start and stop sequences, and 
enhancer or activator sequences. In a preferred 
15 embodiment, the regulatory sequences include a promoter 

and transcriptional start and stop sequences. 

Promoter sequences encode either constitutive or 
inducible promoters. The promoters may be either 
naturally occurring promoters or hybrid promoters. 
20 Hybrid promoters, which combine elements of more than 

one promoter, are also known in the art, and are useful 
in the present invention. 

In addition, the expression vector may comprise 
additional elements. For example, the expression vector 
may have two replication systems, thus allowing it to 
be maintained in two organisms, for example in mammalian 
or insect cells for expression and in a procaryotic host 
for cloning and amplification. Furthermore, for 
integrating expression vectors, the expression vector 
30 contains at least one sequence homologous to the host 

cell genome, and preferably two homologous sequences 
which flank the expression construct. The integrating 
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vector may be directed to a specific locus in the host 
cell by selecting the appropriate homologous sequence 
for inclusion in the vector. Constructs for integrating 
vectors are well known in the art . 

In addition, in a preferred embodiment, the expression 
vector contains a selectable marker gene to allow the 
selection of transformed host cells. Selection genes 
are well known in the art and will vary with the host 
cell used. 

The HAP proteins of the present invention are produced 
by culturing a host cell transformed with an expression 
vector containing nucleic acid encoding a HAP protein, 
under the appropriate conditions to induce or cause 
expression of the HAP protein. The conditions 
appropriate for HAP protein expression will vary with 
the choice of the expression vector and the host cell, 
and will be easily ascertained by one skilled in the art 
through routine experimentation. For example, the use 
of constitutive promoters in the expression vector will 
require optimizing the growth and proliferation of the 
host cell, while the use of an inducible promoter 
requires the appropriate growth conditions for 
induction. In addition, in some embodiments, the timing 
of the harvest is important. For example, the 
baculoviral systems used in insect cell expression are 
lytic viruses, and thus harvest time selection can be 
crucial for product yield. 

Appropriate host cells include yeast, bacteria, 
archebacteria, fungi, and insect and animal cells, 
including mammalian cells* Of particular interest are 
Drosophila melanaaster cells, Saccharomvces cerevisiae 
and other yeasts, E. coli . Bacillus subtilis . SF9 cells, 
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C129 cells, 293 cells, Neurospora, BHK, CHO, COS, and 
HeLa cells, immortalized mammalian myeloid and lymphoid 
cell lines. 

in a preferred embodiment, HAP proteins are expressed 
in bacterial systems. Bacterial expression systems are 
well known in the art. 

A suitable bacterial promoter is any nucleic acid 
sequence capable of binding bacterial RNA polymerase and 
initiating the downstream (3') transcription of the 
coding sequence of HAP protein into mRNA. A bacterial 
promoter has a transcription initiation region which is 
usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region 

typically includes an RNA polymerase binding site and 
a transcription initiation site. Sequences encoding 
metabolic pathway enzymes provide particularly useful 
promoter sequences. Examples include promoter sequences 
derived from sugar metabolizing enzymes, such as 
galactose, lactose and maltose, and sequences derived 
from biosynthetic enzymes such as tryptophan. Promoters 
from bacteriophage may also be used and are known in the 
art in addition, synthetic promoters and hybrid 
promoters are also useful; for example, the tac promoter 
is a hybrid of the trp and lac promoter sequences. 
Furthermore, a bacterial promoter can include naturally 
occurring promoters of non-bacterial origin that have 
the ability to bind bacterial RNA polymerase and 
initiate transcription. 

in addition to a functioning promoter sequence, an 
efficient ribosome binding site is desirable. In E. 
coli the ribosome binding site is called the Shine- 
Delgamo (SD) sequence and includes an initiation codon 
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and a sequence 3-9 nucleotides in length located 3-11 
nucleotides upstream of the initiation codon. 

The expression vector may also include a signal peptide 
sequence that provides for secretion of the HAP protein 
in bacteria. The signal sequence typically encodes a 
signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell, 
as is well known in the art. The protein is either 
secreted into the growth media (gram-positive bacteria) 
or into the periplasmic space, located between the inner 
and outer membrane of the cell (gram-negative bacteria) . 

The bacterial expression vector may also include a 
selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable 
selection genes include genes which render the bacteria 
resistant to drugs such as ampicillin, chloramphenicol, 
erythromycin, kanamycin, neomycin and tetracycline. 
Selectable markers also include biosynthetic genes, such 
as those in the histidine, tryptophan and leucine 
biosynthetic pathways* 

These components are assembled into expression vectors . 
Expression vectors for bacteria are well known. in the 
art, and include vectors for Bacillus subtilis, E. coli, 
Streptococcus cremoris, and Streptococcus lividans, 
among others . 

The bacterial expression vectors are transformed into 
bacterial host cells using techniques well known in the 
art, such as calcium chloride treatment, 
electroporation, and others. 



WO 96/05858 



PCT/US95/10661 



-22- 



in one embodiment, HAP proteins are produced in xnsect 
cells. Expression vectors for the transformation of 
insect cells, and in particular, baculovirus -based 
expression vectors, are well known in the art. Briefly, 
5 baculovirus is a very large DNA virus which produces its 

coat protein at very high levels. Due to the size of 
the baculoviral genome, exogenous genes must be placed 
in the viral genome by recombination. Accordingly, the 
components of the expression system include: a transfer 

10 vector, usually a bacterial plasmid, which contains both 

a fragment of the baculovirus genome, and a convenient 
restriction site for insertion of the HAP protein; a 
wild type baculovirus with a sequence homologous to the 
baculovirus-specific fragment in the transfer vector 

15 (this allows for the homologous recombination of the 

heterologous gene into the baculovirus genome); and 
appropriate insect host cells and growth media. 

Mammalian expression systems are also known in the art 
and are used in one embodiment. A mammalian promoter 
20 is any DNA sequence capable of binding mammalian RNA 

polymerase and initiating the downstream (3') 
transcription of a coding sequence for HAP protein into 
mRNA. A promoter will have a transcription initiating 
region, which is usually place proximal to the 5' end 
25 of the coding sequence, and a TATA box, using a located 

25-30 base pairs upstream of the transcription 
initiation site. The TATA box is thought to direct RNA 
polymerase II to begin RNA synthesis at the correct 
site. A mammalian promoter will also contain an 
30 upstream promoter element, typically located within 100 
to 200 base pairs upstream of the TATA box. An upstream 
promoter element determines the rate at which 
transcription is initiated and can act in either 
orientation. Of particular use as mammalian promoters 
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are the promoters from mammalian viral genes, since the 
viral genes are often highly expressed and have a broad 
host range. Examples include the SV40 early promoter, 
mouse mammary tumor virus LTR promoter, adenovirus major 
late promoter, and herpes simplex virus promoter. 

Typically, transcription termination and polyadenylation 
sequences recognized by mammalian cells are regulatory 
regions located 3' to the translation stop codon and 
thus, together with the promoter elements, flank the 
coding sequence. The 3' terminus of the mature mRNA is 
formed by site-specific post- translational cleavage and 
polyadenylation. Examples of transcription terminator 
and polyadenlytion signals include those derived form 
SV40. 

The methods of introducing exogenous nucleic acid into 
mammalian hosts, as well as other hosts, is well known 
in the art, and will vary with the host cell used. 
Techniques include dextran-mediated transf ection, 
calcium phosphate precipitation, polybrene mediated 
transf ection, protoplast fusion, electroporation, 
encapsulation of the polynucleotide (s) in liposomes, and 
direct microinjection of the DNA into nuclei. 

In a preferred embodiment, HAP protein is produced in 
yeast cells. Yeast expression systems are well known 
in the art, and include expression vectors for 
Saccharomyces cerevisiae , Candida albicans and C. 
maltosa, Hansenula polymorpha, Kluweromvces fracrilis 
and K. lactis , Pichia cruillerimondii and P. oastoris , 
Schizosaccharomvces pombe , and Yarrowia lipolytica . 
Preferred promoter sequences for expression in yeast 
include the inducible GAL1,10 promoter, the promoters 
from alcohol dehydrogenase, enolase, glucokinase, 
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gluc ose-6-phoe P hate isomeraee. ^yceraldehyde 3 
Phosphate-dehydrogenase. hexokmase. 
Phosphcfructokinase. 3 -phosphoglycerate mutase. pyruvate 
Kinase, and the acid phosphatase gene. Yeast «l*ctabXe 
markers included. HIS4. LEU2. TRP1. and HIP, which 
fcnLs resistance to tunicamycin; the O^B resistance 
gen e. which confers resistance to G418; and the CUP1 
gene, which allows yeast to grow in the presence of 
copper ions . 

A recombinant HAP protein may be expressed 
intracellular^ or secreted. The HAP protein may also 
be made as a fusion protein, using techniques well known 
in the art. Thus, for example, if the desired epitope 
is small, the HAP protein may be fused to a carrier 
protein to form an immunogen. Alternatively, the HAP 
protein may be made as a fusion protein to increase 
expression . 

Also included within the definition of HAP proteins of 
the present invention are amino acid sequence variants. 
These variants fall into one or more of three classes: 
substitutional, insertional or deletional variants 
Th ese variants ordinarily are prepared by 
mutagenesis of nucleotides in the DNA encoding the HAP 
protein, using cassette mutagenesis or other "Chnrques 

^ j ~ tykta pnrndma the 



protein, ubiny - . . 

• to produce DNA encoding the 

well known in the art, to 
variant. and thereafter expressing the DNA in 
recombinant cell culture as outlined above However. 

« • an 



reconuDincniL. _ n 

variant HAP protein fragments having up to about 100-150 
rZ dues may be prepared by iB^a synthesrs using 
established techniques. Amino acid sequence variants 
are characterized by the predetermined nature of the 
variation, a feature that sets them apart from 
occurring allelic or interspecies variation of the HAP 
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protein amino acid sequence. The variants typically 
exhibit the same qualitative biological activity as the 
naturally occurring analogue, although variants can also 
be selected which have modified characteristics as will 
be more fully outlined below. 

While the site or region for introducing an amino acid 
sequence variation is predetermined, the mutation per 
se need not be predetermined. For example, in order to 
optimize the performance of a mutation at a given site, 
random mutagenesis may be conducted at the target codon 
or region and the expressed HAP protein variants 
screened for the optimal combination of desired 
activity. Techniques for making substitution mutations 
at predetermined sites in DNA having a known sequence 
are well known, for example, M13 primer mutagenesis. 
Screening of the mutants is done using assays of HAP 
protein activities; for example, mutated HAP genes are 
placed in HAP deletion strains and tested for HAP 
activity, as disclosed herein. The creation of deletion 
strains, given a gene sequence, is known in the art. 
For example, nucleic acid encoding the variants may be 
expressed in a Haemophilus influenzae strain deficient 
in the HAP protein, and the adhesion and infectivity of 
the variant Haemophilus influenzae evaluated. 
Alternatively, the variant HAP protein may be expressed 
and its biological characteristics evaluated, for 
example its proteolytic activity. 

Amino acid substitutions are typically of single 
residues; insertions usually will be on the order of 
from about 1 to 20 amino acids , although considerably 
larger insertions may be tolerated. Deletions range 
from about 1 to 30 residues, although in some cases 
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deletions may be much larger, as for example when one 
of the domains of the HAP protein is deleted. 

Substitutions, deletions, insertions or any combination 
thereof may be used to arrive at a final derivative. 
Generally these changes are done on a few amino acids 
to minimize the alteration of the molecule. However, 
larger changes may be tolerated in certain 
circumstances . 

When small alterations in the characteristics of the HAP 
protein are desired, substitutions are generally made 
in accordance with the following chart: 

Chart I 

Original residue FyPmp1 ar y Substitutions 



Ala 



Ser 



15 Arg QlUf His 



Asn 
Asp 



Glu 
Ser 



CY S Asn 
Gin 



20 Glu 



Asp 
Pro 



G }y Asn, Gin 



His 
He 



Leu, Val 
He, Val 



Leu Arg, Gin, Glu 

25 Lys Leu# Ile 

Met, Leu, Tyr 
Thr 
Ser 
Tyr 

3 0 Trp Trp, Phe 



Met 

Phe 
Ser 
Thr 



Tyr Ile , Leu 



35 



Substantial changes in function or immunological 
identity are made by selecting substitutions that are 
less conservative than those shown in Chart I. For 
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example, substitutions may be made which more 
signif icantly affect: the structure of the polypeptide 
backbone in the area of the alteration, for example the 
alpha-helical or beta-sheet structure; the charge or 
5 hydrophobicity of the molecule at the target site; or 

the bulk of the side chain. The substitutions which in 
general are expected to produce the greatest changes in 
the polypeptide's properties are those in which (a) a 
hydrophilic residue, e.g. seryl or threonyl, is 

10 substituted for (or by) a hydrophobic residue, e.g. 

leucyl, isoleucyl, phenylalanyl , valyl or alanyl; (b) 
a cysteine or proline is substituted for (or by) any 
other residue; (c) a residue having an electropositive 
side chain, e.g. lysyl, arginyl, or hist idyl, is 

15 substituted for (or by) an electronegative residue, e.g. 

glutamyl or aspartyl ; or (d) a residue having a bulky 
side chain, e.g. phenylalanine, is substituted for (or 
by) one not having a side chain, e.g. glycine. 

The variants typically exhibit the same qualitative 
20 biological activity and will elicit the same immune 

response as the naturally-occurring analogue, although 
variants also are selected to modify the characteristics 
of the polypeptide as needed. Alternatively, the 
variant may be designed such that the biological 
25 activity of the HAP protein is altered. For example, 

the proteolytic activity of the larger 110 kD domain of 
the HAP protein may be altered, through the substitution 
of the amino acids of the active site. The putative 
catalytic domain of this protein is GDSGSPMF, with the 
3 0 first serine corresponding to the active site serine 

characteristic of serine type proteases. The residues 
of the active site may be individually or simultaneously 
altered to decrease or eliminate proteolytic activity. 
This may be done to decrease the toxicity or side 
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effects of the vaccine. Similarly, the cleavage sxte 
between the 45 kD domain and the 100 kD domain may be 
altered, for example to eliminate proteolytic processing 
to form the two domains. Putatively this site xs at 
5 residue 960- 

in a preferred embodiment, the HAP protein is purified 
or isolated after expression. HAP proteins may be 
isolated or purified in a variety of ways known to those 
skilled in the art depending on what other components 
are present in the sample. Standard purification 
me thods include electrophoret ic , m ° leC ^' 
immunological and chromatographic techniques, xncludxng 
ion exchange, hydrophobic, affinity, and reverse-phase 
HPLC chromatography, and chroma t of ocusing . For example, 
1S the HAP protein may be purified using a standard antx- 
HAP antibody column. Ultrafiltration and diaf iltratxon 
techniques, in conjunction with protein concentration 
are also useful. For general guidance xn suxtable 
purification techniques, see Scopes, R. , Protexn 
20 Purification, Springer-Verlag, NY (1982) The degree 
of purification necessary will vary dependxng on the use 
of the HAP protein. In some instances no purification 
will be necessary. 

Once expressed and purified if necessary, the HAP 
25 proteins are useful in a number of applications. 

For example, the HAP proteins can be coupled, using 
standard technology, to affinity chromatography columns . 
These columns may then be used to purify antibodies from 
samples obtained from animals or patients exposed to the 
30 Haemophilus influenzae organism. The purged 
antibodies may then be used as outlined below. 
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Additionally, the HAP proteins are useful to make 
antibodies to HAP proteins. These antibodies find use 
in a number of applications. In a preferred embodiment, 
the antibodies are used to diagnose the presence of an 
5 Haemophilus influenzae infection in a sample or patient. 

This will be done using techniques well known in the 
art; for example, samples such as blood or tissue 
samples may be obtained from a patient and tested for 
reactivity with the antibodies, for example using 
0 standard techniques such as ELISA. In a preferred 

embodiment, monoclonal antibodies are generated to the 
HAP protein, using techniques well known in the art. 
As outlined above, the antibodies may be generated to 
the full length HAP protein, or a portion of the HAP 
protein. 

Antibodies generated to HAP proteins may also be used 
in passive immunization treatments, as is known in the 
art . 

Antibodies generated to unique sequences of HAP proteins 
may also be used to screen expression libraries from 
other organisms to find, and subsequently clone, HAP 
nucleic acids from other organisms. 

In one embodiment, the antibodies may be directly or 
indirectly labelled. By "labelled" herein is meant a 
compound that has at least one element, isotope or 
chemical compound attached to enable the detection of 
the compound. In general, labels fall into three 
classes: a) isotopic labels, which may be radioactive 
or heavy isotopes; b) immune labels, which may be 
antibodies or antigens; and c) colored or fluorescent 
dyes. The labels may be incorporated into the compound 
at any position. Thus, for example, the HAP protein 
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w 1ah pHed for detection, or a secondary 
antibodv may be labeliea «<= 

antiWy to the HAP Pro«in antibody -ay be created and 
labelled. 

In one embodiment, the antibodies generated to the HAP 
proteins of the present invention are used 
separate HAP proteins or the Haemophilus influence 
organism fx. a sample. Thus for example, antibodies 
generated to HAP proteins which will bind to the 
Lemophilus influenzae organism may be coupled, using 
standard technology, to affinity chromatography columns. 
These columns can be used to pull out the Haemophilus 
organism from environmental or tissue 
Alternatively, antibodies generated to the soluble 110 
kD portion of the full-length portion of the protein 
shown in Figure 7 may be used to purify the 110 *D 
protein from samples. 

In a preferred embodiment, the HAP proteins of the 
present invention are used as vaccines for the 
prophylactic or therapeutic treatment of a Haea.oph.lus 
IXLsae infection in a patient. By "vaccine- herein 
is meant an antigen or compound which elicits an immune 
.espouse in an animal or patient. The vaccine may be 
administered prophylactically. for example to a patient 
never previously exposed to the antigen, such that 
subsequent infection by the Haemophilus influenzae 
organism is prevented. Alternatively, the vaccine may 
b e administered therapeutically to a patient P™*!* 
exposed or infected by the Haemophilus 
organism. While infection cannot be prevented, in this 
else an immune response is generated which allows the 
patient's immune system to more effectively combat the 
Infection. Thus, for example, there may be a decrease 
or lessening of the symptoms associated with infection. 
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A "patient" for the purposes of the present invention 
includes both humans and other animals and organisms. 
Thus the methods are applicable to both human therapy 
and veterinary applications. 

The administration of the HAP protein as a vaccine is 
done in a variety of ways. Generally, the HAP proteins 
can be formulated according to known methods to prepare 
pharmaceutically useful compositions, whereby 
therapeutically effective amounts of the HAP protein are 
combined in admixture with a pharmaceutically acceptable 
carrier vehicle. Suitable vehicles and their 

formulation are well known in the art. Such 
compositions will contain an effective amount of the HAP 
protein together with a suitable amount of vehicle in 
order to prepare pharmaceutically acceptable 
compositions for effective administration to the host. 
The composition may include salts, buffers, carrier 
proteins such as serum albumin, targeting molecules to 
localize the HAP protein at the appropriate site or 
tissue within the organism, and other molecules. The 
composition may include adjuvants as well. 

In one embodiment, the vaccine is administered as a 
single dose; that is, one dose is adequate to induce a 
sufficient immune response to prophylactically or 
therapeutically treat a Haemophilus influenzae 
infection. In alternate embodiments, the vaccine is 
administered as several doses over a period of time, as 
a primary vaccination and "booster" vaccinations. 

By "therapeutically effective amounts" herein is meant 
an amount of the HAP protein which is sufficient to 
induce an immune response. This amount may be different 
depending on whether prophylactic or therapeutic 
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treatment is desired. Generally, this ranges from about 
0.001 mg to about 1 gm, with a preferred range of about 

0 05 to about , and the preferred dose being . 

These amounts may be adjusted if adjuvants are used. 

5 The following examples serve to more fully describe the 

manner of using the above -described invention, as well 
as to set forth the best modes contemplated for carrying 
out various aspects of the invention. It is understood 
that these examples in no way serve to limit the true 
10 scope of this invention, but rather are presented for 

illustrative purposes. 

EXAMPLES 

Example 1 
Cloning of the HAP protein 

Bacterial Strains, plasmids, and phage. H. influenzae 
strain N187 is a clinical isolate that was originally 
cultivated from the middle ear fluid of a child with 
acute otitis media. This strain was classified as 
nontypable based on the absence of agglutination with 
20 typing antisera for H. influenzae types a-f (Burroughs 
Wellcome) and the failure to hybridize with P U038, a 
plasmid that contains the entire cap b locus (Kroll and 
Moxon, 1988, J. Bacterid. 170:859-864). 

H influenzae strain DB117 is a reel mutant of Rd, a 
25 capsule-deficient serotype d strain that has been in the 

laboratory for over 40 years (Alexander and Leidy, 1951, 
J. Exp. Med. 83:345-359); DB117 was obtained from G. 
Barcak (University of Maryland, Baltimore, MD) (Sellow 
et al.. 1968). DB117 is deficient for in vitro 
30 adherence and invasion, as assayed below. 
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H. influenzae strain 12 is the nontypable strain from 
which the genes encoding the HMW1 and HMW2 proteins were 
cloned (Barenkamp and Leininger, 1992 , Infect* Immun. 
60:1302-1313) ; HMW1 and HMW2 are the prototypic members 
5 of a family of nontypable Haemophilus antigenically- 

related high-molecular-weight adhesive proteins (St. 
Geme et al . , 1993). 

E. coli HB101, which is nonadherent and noninvasive, has 
been previously described (Sambrook et al . , 1989, 
0 Molecular cloning: a laboratory manual, 2nd ed. Cold 

Spring Harbor Laboratory, Cold Spring Harbor, N.Y. ) . 
E. coli DH5of was obtained from Bethesda Research 
Laboratories. E* coli MC1061 was obtained from H. 
Kimsey (Tufts University, Boston, MA) . E. coli XL-1 
Blue and the plasmid pBluescript KS- were obtained from 
Stratagene. Plasmid pT7-7 and phage mGPl-2 were 
provided by S. Tabor (Harvard Medical School, Boston, 
MA) (Tabor and Richardson, 1985, Proc . Natl. Acad. Sci . 
USA. 82:1074-1078) . The E. coli -Haemophilus shuttle 
vector pGJB103 (Tomb et al . , 1989, Rd. J. Bacteriol . 
171:3796-3802) and phage X1105 (Way et al., 1984, Gene. 
32:3 69-379) were provided by G. Barcak (University of 
Maryland, Baltimore, MD) . Plasmid pVD116 harbors the 
IgAl protease gene from H. influenzae strain Rd (Koomey 
and Falkow, 1984, Infect. Immun. 43:101-107) and was 
obtained from M. Koomey (University of Michigan, Ann 
Arbor, MI) . 

Growth conditions. H. influenzae strains were grown as 
described (Anderson et al . , 1972, J. Clin. Invest. 
51:31-38). They were stored at -80°C in brain heart 
infusion broth with 25% glycerol. E. coli strains were 
grown on LB agar or in LB broth. They were stored at - 
80°C in LB broth with 50% glycerol. 
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For H. Influenzae, tetracycline was used in a 
concentration of 5 fig/ml and kanamycin was used in a 
concentration of 25 ng/ml . For B. coli. antibiotics 
were used in the following concentrations: 
5 tetracycline, 12.5 fig/ml; kanamycin, 50 w/ml; 

ampicillin, 100 fig/ml • 

Recombinant DNA methods. DNA ligations, restriction 
endonuclease digestions, and gel electrophoresis were 
performed according to standard techniques (Sambrook et 

10 al., 1989, supra). Plasmids were introduced into E. 

coli strains by either chemical transformation or 
electroporation, as described (Sambrook et al , 1989, 
supra; Dower et al . , 1988, Nucleic Acids Res. 16:617- 
6145) . In H. influenzae transformation was performed 

15 using the MIV method of Herriott et al. (1970, J. 

Bacterid. 101:517-524) , and electroporation was carried 
out using the protocol developed for E. coli (Dower et 
al. , 1988, supra) . 



20 



Construction of genomic library from H. influenzae 
strain N187 . High-molecular-weight chromosomal DNA was 
prepared from 3 ml of an overnight broth culture of H. 
influenzae N187 as previously described (Mekalanos, 
1983, Cell. 35:253-263). Following partial digestion 
with Sau3AI, 8 to 12 kb fragments were eluted into DEAE 
25 paper (Schleicher & Schuell, Keene, H.H.) and then 

ligated to Bglll -digested calf intestine phosphatase- 
treated pGJB103 . The ligation mixture was 

electroporated into H . influenzae DB117, and 
t r ans f o rmant s 

30 were selected on media containing tetracycline. 



Transposon mutagenesis. 
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Mutagenesis of plasraid DNA was performed using the mini- 
TnlO kan element described by Way et al . (1984, supra) . 
Initially, the appropriate plasmid was introduced into 
E. coli MC1061. The resulting strain was infected with 
A1105, which carries the mini-TnlO Jean transposon. 
Transductants were grown overnight in the presence of 
kanamycin and an antibiotic to select for the plasmid, 
and plasmid DNA was isolated using the alkaline lysis 
method. In order to recover plasmids containing a 
transposon insertion, plasmid DNA was elect roporated 
into E. coli DH5a, plating on media containing kanamycin 
and the appropriate second antibiotic. 

In order to establish more precisely the region of pN187 
involved in promoting interaction with host cells, 
initially this plasmid was subjected to restriction 
endonuclease analysis. Subsequently, several subclones 
were constructed in the vector pGJB103 and were 
reintroduced into H. influenzae strain DB117. The 
resulting strains were then examined for adherence and 
invasion. As summarized in Figure 4, subclones 
containing either a 3.9-kb Pstl-Bgrlll fragment (pJS105) 
or the adjoining 4.2-kb Bgrlll fragment (pJS102) failed 
to confer the capacity to associate with Chang cells. 
In contrast, a subclone containing an insert that 
included portions of both of these fragments (pJS106) 
did promote interaction with epithelial monolayers. 
Transposon mutagenesis performed on pH187 confirmed that 
the flanking portions of the insert in this plasmid were 
not required for the adherent/invasive phenotype. On 
the other hand, a transposon insertion located adjacent 
to the BgllX site in pJS106 eliminated adherence and 
invasion. An insertion between the second KcoRI and 
PstI sites in this plasmid had a similar effect (Figure 
4) . 
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Examination of plasmid- encoded proteins* 

In order to examine plasmid encoded proteins, relevant 
DNA was ligated into the bacteriophage T7 expression 
vector pT7-7, and the resulting construct was 
5 transformed into E. coli XL-1 Blue. Plasmid pT7-7 

contains the T7 phage 010 promoter and ribosomal binding 
site upstream of a multiple cloning site (Tabor and 
Richardson, 1985, supra) . The T7 promoter was induced 
by infection with the recombinant M13 phage mGPl-2 and 
10 addition of isopropyl-/?-D-thiogalactopyranoside (final 

concentration, 1 mM) . Phage mGPl-2 contains the gene 
encoding T7 RNA polymerase, which activates the 010 
promoter in pT7-7 (Tabor and Richardson, 1985, supra) . 

Like DB117(pN187) , strain DB117 carrying pJS106 

15 expressed new outer membrane proteins 160-kD and 45-kD 

in size (Figure 3, lane 3) . In order to examine whether 
the 6.5-kb insert in pJS106 actually encodes these 
proteins, this fragment of DNA was ligated into the 
bacteriophage T7 expression vector pT7-7. The resulting 

20 plasmid containing the insert in the same orientation 

as in pN187 was designated pJS104, and the plasmid with 
the insert in the opposite orientation was designated 
pJS103. Both pJS104, and p7S103 were introduced into 
E. coli XL-1 Blue, producing XL-1 Blue(pJS104) and XL-1 

25 Blue(pJS103) , respectively. As a negative control, pT7- 

7 was also transformed into XL-1 Blue. The T7 promoter 
was induced in these three strains by infection with the 
recombinant M13 phage mGPl-2 and addition of isopropyl- 
0-D-thiogalactopyranoside (final concentration, 1 mM) , 

30 and induced proteins were detected using I "5] 

methionine. As shown in Figure 5, induction of XL-1 
Blue(pJS104) resulted in expression of a 160-kD protein 
and several smaller proteins which presumably represent 
degradation products. In contrast, when XL-1 
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Blue(pJS103) and XL-1 Blue(pT7-7) were induced, there 
was no expression of these proteins. There was no 45-kD 
protein induced in any of the three strains. This 
experiment suggested that the 6.5-kb insert present in 
5 pJS106 contains the structural gene for the 160-kD outer 

membrane protein identified in DB117 (pJS106) . On the 
other hand, this analysis failed to establish the origin 
of the 4 5-kD membrane protein expressed by 
DB117 (pJS106) . 

Adherence and invasion assays* 

Adherence and invasion assays were performed with Chang 
epithelial cells [Wong-Kilbourne derivative, clone l-5c- 
4 (human conjunctiva)] , which were seeded into wells of 
24 -well tissue culture plates as previously described 
(St. Geme and Falkow, 1990). Adherence was measured 
after incubating bacteria with epithelial monolayers for 
30 minutes as described (St. Geme et al., 1993). 
Invasion assays were carried out according to our 
original protocol and involved incubating bacteria with 
epithelial cells for four hours followed by treatment 
with gentamicin for two hours (100 /xg/ml) (St. Geme and 
Falkow, 1990) . 

Nucleotide sequence determination and analysis. 

Nucleotide sequence was determined using a Sequenase kit 
25 and double stranded plasmid template. DNA fragments 

were subcloned into pBluescript KS~ and sequenced along 
both strands by primer walking. DNA sequence analysis 
was performed using the Genetics Computer Group (GCG) 
software package from the University of Wisconsin 
30 (Devereux et al . , 1984). Sequence similarity searches 

were carried out using the BLAST program of the National 
Center for Biotechnology Information (Altschul et a J . , 
1990, J. Mol. Biol. 215:403-410). The DNA sequence 
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described here will be deposited in the 
EMBL/GenBank/DDBJ Nucleotide Sequence Data Libraries. 

Based on the our subcloning results, we reasoned that 
the central Bglll site in pH187 was positioned within 
5 an open reading frame. Examination of a series of mini- 

TnlO kan mutants supported this conclusion (Figure 4) . 
Consequently, we sequenced DHA on either side of this 
Bglll site and identified a 4182 bp gene, which we have 
designated hap for Haemophilus adherence and penetration 
10 (Figure 6) . This gene encodes a 13 94 amino acid 

polypeptide, which we have called Hap, with a calculated 
molecular mass of 155. 4 -kD, in good agreement with the 
molecular mass of the larger of the two novel outer 
membrane proteins expressed by DB117(pN187) and the 
15 protein expressed after induction of XL-1 Blue/pJS104 . 

The hap gene has a G+C content of 39.1%, similar to the 
published estimate of 38.7% for the whole genome 
(Kilian, 1976, J. Gen. Microbiol. 93:9-62). Putative - 
10 and -35 promoter sequences are present upstream of 
20 the initiation codon. A consensus ribosomal binding 

site is lacking. A sequence similar to a rho- 
independent transcription terminator is present 
beginning 39 nucleotides beyond the stop codon and 
contains interrupted inverted repeats with the potential 
25 for forming a hairpin structure containing a loop of 
three bases and a stem of eight bases. Similar to the 
situation with typical E. coli terminators, this 
structure is followed by a stretch rich in T residues. 
Analysis of the predicted amino acid sequence suggested 
30 the presence of a 25 amino acid signal peptide at the 
amino terminus. This region has characteristics typical 
of procaryotic signal peptides, with three positive H- 
terminal charges, a central hydrophobic region, and 
alanine residues at positions 23 and 25 (-3 and -1 
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relative to the putative cleavage site) (von Heijne, 
1984, J. Mol. Biol. 173:243-251). 

Comparison of the deduced amino acid sequence of Hap 
with other proteins. A protein sequence similarity 
search was performed with the predicted amino acid 
sequence using the BLAST network service of the National 
Center for Biotechnology Information (Altschul et al . , 
1990, supra) . This search revealed homology with the 
IgAl proteases of H. influenzae and Neisseria 
gonorrhoeae. Alignment of the derived amino acid 
sequences for the hap gene product and the IgAl 
proteases from four different H. influenzae strains 
revealed homology across the extent of the proteins 
(Figure 7) , with several stretches showing 55-60% 
identity and 70-80% similarity. Similar levels of 
homology were noted between the hap product and the IgAl 
protease from N. gonorrhoeae strain MS11. This 
homology includes the region identified as the catalytic 
site of the IgAl proteases, which is comprised of the 
sequence GDSGSPLF, where 2 is the active site serine 
characteristic of serine proteases (Brenner, 1988, 
Nature (London). 334:528-530; Poulsen et al . , 1992, J. 
Bacteriol. 174:2913-2921). In the case of Hap, the 
corresponding sequence is GDSGSPMF. The hap product 
also contains two cysteines corresponding to the 
cysteines proposed to be important in forming the 
catalytic domain of the IgA proteases (Pohlner et al . , 
1987, supra) . Overall there is 30-35% identity and 51- 
55% similarity between the hap gene product and the H. 
influenzae and N. gonorrhoeae IgA proteases. 

The deduced amino acid sequence encoded by hap was also 
found to contain significant homology to Tsh, a 
hemagglutinin expressed by an avian E. coli strain 
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(Provence and Curtiss, 1994, supra). This homology 
extends throughout both proteins but is greatest in the 
H- terminal half of each. Overall the two proteins are 
30 5% identical and 51.6% similar. Tsh is also 
synthesized as a preprotein and is secreted as a smaller 
form; like the IgAl proteases and perhaps Hap, a carboxy 
terminal peptide remains associated with the outer 
membrane (D. Provence, personal communication) . While 
this protein is presumed to have proteolytic activity, 
its substrate has not yet been determined, 
interestingly, Tsh was first identified on the basis of 
its capacity to promote agglutination of erythrocytes. 
Thus Hap and Tsh are possibly the first members of a 
novel class of adhesive proteins that are processed 
analogously to the IgAl proteases. 

Homology was also noted with pertactin, a 69-kD outer 
membrane protein expressed by B. pertussis (Charles et 
al., 1989, Proc. Natl. Acad. Sci. USA. 86:3554-3558). 
The' middle portions of these two molecules are 39% 
identical and nearly 60% similar. This protein contains 
the amino acid triplet arginine-glycine-aspartic acid 
(RGD) and has been shown to promote attachment to 
cultured mammalian cells via this sequence (Leininger 
et al., 1991, Proc. Natl. Acad. Sci. USA. 88:345-349). 
Although Bordetella species are not generally considered 
intracellular parasites, work by Ewanowich and coworkers 
indicates that these respiratory pathogens are capable 
of in vitro entry into human epithelial cells (Ewanowich 
et al 1989, Infect. Immun. 57:2698-2704; Ewanowich et 
al 1989, infect. Immun. 57:1240-1247). Recently 
Leininger et al . reported that preincubation of 
epithelial monolayers with an RGD- containing peptide 
derived from the pertactin sequence specifically 
inhibited B. pertussis entry (Leininger et al., 1992, 



WO 96/05858 



PCT/US95/10661 



-41- 

Infect. Immun. 60:2380-2385). In addition, these 
. investigators found that coating of Staphylococcus 
aureus with purified pertactin resulted in more 
efficient S. aureus entry; the RGD-containing peptide 
from pertactin inhibited this pertactin-enhanced entry 
by 75%. Although the hap product lacks an RGD motif, 
it is possible that Hap and pertactin serve similar 
biologic functions for H. influenzae and Bordetella 
species , respectively . 

Additional analysis revealed significant homology (34 
to 52% identity, 42 to 70% similarity) with six regions 
of HpmA, a calcium- independent hemolysin expressed by 
Proteus mirabilis (Uphoff and Welch, 1990, supra) . 

The hap locus is distinct from the H. influenzae IgAl 
protease gene. 

Given the degree of similarity between the hap gene 
product and H. influenzae IgAl protease, we wondered 
whether we had isolated the IgAl protease gene of strain 
N187. To examine this possibility, we performed IgAl 
protease activity assays. Among H. influenzae strains, 
two enzymatically distinct types of IgAl protease have 
been found (Mulks etal., 1982, J. Infect. Dis. 146:266- 
274). Type 1 enzymes cleave the Pro-Ser peptide bond 
between residues 231 and 232 in the hinge region of 
human IgAl heavy chain and generate fragments of roughly 
28 -kD and 31-kD; type 2 enzymes cleave the Pro-Thr bond 
between residues 235 and 236 in the hinge region and 
generate 26.5-kD and 32.5-kD fragments. Previous 
studies of the parent strain from which DB117 was 
derived have demonstrated that this strain produces a 
type 1 IgAl protease (Koomey and Falkow, 1984, supra) . 
As shown in Figure 8, comparison of the proteolytic 
activities of strain DB117 and strain N187 suggested 
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that N187 produces a type 2 IgAl protease. We reasoned 
that DB117(pN187) might generate a total of four 
fragments from IgAl protease, consistent with two 
distinct cleavage specificities. Examination of 
DB117( P H187) revealed instead that this transformant 
produces the same two fragments of the IgAl heavy chaxn 
as does DB117, arguing that this strain produces only 
a type 1 enzyme. 

in an effort to obtain additional evidence against the 
possibility that plasmid P H187 contains the N187 IgAl 
protease gene, we performed a series of Southern blots 
As shown in Figure 9. when genomic DNA from strain N187 
was digested with EcoRI, BglH, or BamHI and then probed 
with the hap gene, one set of hybridizing fragments was 
detected. Probing of the same DNA with the iga gene 
from H. influenzae strain Rd resulted in a different set 
of hybridizing bands. Moreover, the iga gene f axled to 
hybridize with a purified 4 .8-kb fragment that contained 
the intact hap gene. 

The recombinant plasmid associated with adherence and 
invasion encodes a secreted protein. 

The striking homology between the hap gene product and 
the Haemophilus and Neisseria IgAl proteases suggested 
the possibility that these proteins might be processed 
in a similar manner. The IgAl proteases are synthesized 
as preproteins with three functional domains: the N- 
terminal signal peptide, the protease, and a C-termxnal 
helper domain, which is postulated to form a pore xn the 
outer membrane for secretion of the protease (Poulsen 
etal., 1989, supra; Pohlner et ml . , 1987, supra). The 
C-termxnal peptide remains associated with the outer 
membrane following an autoproteolytic cleavage event 
that results in release of the mature enzyme. 
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Consistent with the possibility that the hap gene 
product follows a similar fate # we found that 
DB117(pN187) produced a secreted protein approximately 
110 -kD in size that was absent from DB117 (pGJB103 ) 
(Figure 10) . This protein was also produced by 
DB117 (pJS106) , but not by DB117 (pJ5102 ) or 
DB117 (pJS105) • Furthermore, the two mutants with 
transposon insertions within the hap coding region were 
deficient in this protein. In order to determine the 
relationship between hap and the secreted protein, this 
protein was transferred to a PVDF membrane and N- 
terminal amino acid sequencing was performed. Excessive 
background on the first cycle precluded identification 
of the first amino acid residue of the free amino 
terminus. The sequence of the subsequent seven residues 
was found to be HTYFGID, which corresponds to amino 
acids 27 through 33 of the hap product. 

The introduction of hap into laboratory strains of E. 
coli strains was unable to endow these organisms with 
20 the capacity for adherence or invasion. In considering 

these results, it is noteworthy that the E . coli 
transformants failed to express either the 160-kD or the 
45-kD outer membrane protein. Accordingly, they also 
failed to express the 110 -kD secreted protein. The 
25 explanation for this lack of expression is unclear. One 

possibility is that the H. influenzae promoter or 
ribosomal binding site was poorly recognized in E. coli. 
Indeed the putative -35 sequence upstream of the hap 
initiation codon is fairly divergent from the a70 
3 0 consensus sequence, and the ribosomal binding site is 

unrecognizable. Alternatively, an accessory gene may 
be required for proper export of the Hap protein, 
although the striking homology with the IgA proteases, 
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which are normally expressed and secreted in E. coli, 
argues against this hypothesis. 

In considering the possibility that the hap gene product 
promotes adherence and invasion by directly binding to 
5 a host cell surface structure, it seems curious that the 

mature protein is secreted from the organism. However, 
there are examples of other adherence factors that are 
also secreted. Filamentous hemagglutinin is a 220-kD 
protein expressed by B . pertussis that mediates in vitro 

10 adherence and facilitates natural colonization (Relman 

et al., 1989, Proc. Natl. Acad. Sci . U.S.A. 86:2637- 
2641; Kimura et al . , 1990, Infect. Immun. 58:7-16). 
This protein remains surface-associated to some extent 
but is also released from the cell . The process of 

15 Filamentous hemagglutinin secretion involves an 

accessory protein designated FhaC, which appears to be 
localized to the outer membrane (Willems et al . , 1994, 
Molec. Microbiol. 11:337-347). Similarly, the Ipa 
proteins implicated in Shigella invasion are also 

20 secreted. Secretion of these proteins requires the 

products of multiple genes within the mxi and spa loci 
(Allaoui et al . , 1993, Molec. Microbiol. 7:59-68; 
Andrews et al . , 1991, Infect. Immun. 59:1997-2005; 
Venkatsan et al . , 1992, J. Bacteriol. 174:1990-2001). 

25 It is conceivable that secretion is simply a consequence 

of the mechanism for export of the hap gene product to 
the surface of the organism. However, it is noteworthy 
that the secreted protein contains a serine-type 
protease catalytic domain and shows homology with the 

30 P. mirobilis hemolysin. These findings suggest that the 

mature Hap protein may possess proteolytic activity and 
raise the possibility that Hap promotes interaction with 
the host cell at a distance by modifying the host cell 
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surface. Alternatively, Hap may modify the bacterial 
surface in order to facilitate interaction with a host 
cell receptor. It is possible that hap encodes a 
molecule with dual functions, serving as both adhesin 
5 and protease . 

Analysis of outer membrane and secreted proteins. 

Outer membrane proteins were isolated on the basis of 
sarcosyl insolubility according to the method of Carlone 
etai. (1986, J. Clin. Microbiol. 24:330-332). Secreted 

10 proteins were isolated by centrifuging bacterial 

cultures at 16,000 g for 10 minutes, recovering the 
supernatant, and precipitating with trichloroacetic acid 
in a final concentration of 10%. SDS-polyacrylamide gel 
electrophoresis was performed as previously described 

15 (Laemmli, 1970, Nature (London). 227:680-685). 

To identify proteins that might be involved in the 
interaction with the host cell surface, outer membrane 
protein profiles for DB117(pN187) and DB117 (pGJB103 ) 
were compared. As shown in Figure 3, DB117(pN187) 

20 expressed two new outer membrane proteins: a high- 

molecular-weight protein approximately 160 -kD in size 
and a 45-kD protein. E. coli HB101 harboring pN187 
failed to express these proteins, suggesting an 
explanation for the observation that HB101(pN187) is 

25 incapable of adherence or invasion. 



Previous studies have demonstrated that a family of 
antigenically-related high-molecular- weight proteins 
with similarity to filamentous hemagglutinin of 
Bordetella pertussis mediate attachment by nontypable 
30 H. influenzae to cultured epithelial cells (St. Geme et 
al., 1993). To explore the possibility that the gene 
encoding the strain H187 member of this family was 
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cloned, whole cell lysates of N187, DB117 (pN187) , and 
DB117 (pGJB103) were examined by Western immunoblot . Our 
control strain for this experiment was H. influenzae 
strain 12. Using a polyclonal antiserum directed 
5 against HMW1 and HMW2 , the prototypic proteins in this 

family, we identified a 140-kD protein in strain H187 
(not shown). In contrast, this antiserum failed to 
react with either DB117(pN187) or DB117 (pGJB103 ) (not 
shown) , indicating that pN187 has no relationship to HMW 
10 protein expression. 

Determination of amino terminal sequence. Secreted 
proteins were precipitated with trichloroacetic acid, 
separated on a 10% SDS-polyacrylamide gel, and 
electrotransf erred to a polyvinyl idene difluoride (PVDF) 
membrane (Matsudaira, 1987, J. Biol. Chem. 262:10035- 
10038) . Following staining with Coomassie Brilliant 
Blue R-250, the 110-kD protein was cut from the PVDF 
membrane and submitted to the Protein Chemistry 
Laboratory at Washington University School of Medicine 
for amino terminal sequence determination. Sequence 
analysis was performed by automated Edman degradation 
using an Applied Biosystems Model 470A protein 
sequencer. 

Examination of IgAl protease activity. In order to 
25 assess IgAl protease activity, bacteria were inoculated 

into broth and grown aerobically overnight. Samples 
were then centrifuged in a microphage for two minutes, 
and supernatants were collected. A 10 fil volume of 
supernatant was mixed with 16 /xl of 0.5 ng/ml human IgAl 
30 (Calbiochem) , and chloramphenicol was added to a final 

concentration of 2 /xg/ml. After overnight incubation 
at 37 °C, reaction mixtures were electrophoresed on a 10% 
SDS-polyacrylamide gel, transferred to a nitrocellulose 
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membrane, and probed with goat anti -human IgAl heavy 
chain conjugated to alkaline phosphatase (Kirkegaard & 
Perry) . The membrane was developed by immersion in 
phosphatase substrate solution (5-bromo-4 -chloro-3- 
indolylphosphate toluidinium-nitro blue tetrazolium 
substrate system; Kirkegaard & Perry) . 

Immunoblot analysis. Immunoblot analysis of bacterial 
whole cell lysates was carried out as described (St. 
Geme et al . , 1991) . 

Southern hybridization. Southern blotting was performed 
using high stringency conditions as previously described 
(St. Geme and Falkow, 1991) . 

Microscopy. 

i. Light microscopy. Samples of epithelial cells with 
associated bacteria were stained with Giemsa stain and 
examined by light microscopy as described (St. Geme and 
Falkow, 1990) . 

ii. Transmission electron microscopy. For transmission 
electron microscopy, bacteria were incubated with 
epithelial cell monolayers for four hours and were then 
rinsed four times with PBS, fixed with 2% 
glutaraldehyde/1% osmium tetroxide in 0.1 M sodium 
phosphate buffer pH 6.4 for two hours on ice, and 
stained with 0.25% aqueous uranyl acetate overnight. 
Samples were then dehydrated in graded ethanol solutions 
and embedded in polybed. Ultrathin sections (0.4 pirn) 
were examined in a Phillips 201c electron microscope. 

As shown in Figure 2, DB117(pN187) incubated with 
monolayers for four hours demonstrated intimate 
interaction with the epithelial cell surface and was 
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occasionally found to be intracellular. In a given thin 
section, invaded cells generally contained one or two 
intracellular organisms. Of note, intracellular bacteria 
were more common in sections prepared with strain N187, 
5 an observation consistent with results using the 

gentamicin assay. In contrast, examination of samples 
prepared with strain DB117 carrying cloning vector alone 
(pGJB103) failed to reveal internalized bacteria (not 
shown) . 

10 Having described the preferred embodiments of the 

present invention it will appear to those of ordinary 
skill in the art that various modifications may be made 
to the disclosed embodiments, and that such 
modifications are intended to be within the scope of the 

15 present invention* 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Washington University, et al . 
(ii) TITLE OF INVENTION: Haemophilus Adherence and Penetration Protein 
(iii) NUMBER OF SEQUENCES: 9 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Flehr, Hohbach, Test, Albritton & Herbert 

(B) STREET: 4 Embarcadero Center, Suite 3400 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: United States 

(F) ZIP: 94111-4187 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

<B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

<D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER; PCT/US95/ 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/2 96,791 

(B) FILING DATE: 25 AUG 1994 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Trecartin, Richard F. 

(B) REGISTRATION NUMBER: 31,801 

(C) REFERENCE/DOCKET NUMBER: FP-59941/RFT 

<ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 781-1989 

(B) TELEFAX: (415) 398-3249 

(C) TELEX: 910 277299 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4319 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: both 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 60.. 4241 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TCAATAGTCG TTTAACTAGT ATTTTTTAAT ACGAAAAATT ACTTAATTAA ATAAACATT 59 

ATG AAA AAA ACT GTA TTT CGT CTT AAT TTT TTA ACC GCT TGC ATT TCA 107 
Met Lys Lys Thr Val Phe Arg Leu Asn Phe Leu Thr Ala Cys lie Ser 
15 10 15 
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TTA GGG ATA GTA TCG CAA GCG TGG GCT GGT CAC ACT TAT TTT GGG ATT 
Leu Gly lie Val Ser Gin Ala Trp Ala Gly His Thr Tyr Phe Gly lie 
20 25 30 

GAT TAC CAA TAT TAT CGT GAT TTT GCC GAG AAT AAA GGG AAG TTC ACA 
Asp Tyr Gin Tyr Tyr Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Thr 
35 40 45 

GTT GGG GCT CAA AAT ATT AAG GTT TAT AAC AAA CAA GGG CAA TTA GTT 
Val Gly Ala Gin Asn lie Lys Val Tyr Asn Lys Gin Gly Gin Leu Val 
50 55 60 

GGC ACA TCA ATG ACA AAA GCC CCG ATG ATT GAT TTT TCT GTA GTG TCA 
Gly Thr Ser Met Thr Lys Ala Pro Met lie Asp Phe Ser Val Val Ser 
65 70 7S 80 

CGT AAC GGC GTG GCA GCC TTG GTT GAA AAT CAA TAT ATT GTG AGC GTG 
Arg Asn Gly Val Ala Ala Leu Val Glu Asn Gin Tyr lie Val Ser Val 
85 90 95 

GCA CAT AAC GTA GGA TAT ACA GAT GTT GAT TTT GGT GCA GAG GGA AAC 
Ala His Asn Val Gly Tyr Thr Asp Val Asp Phe Gly Ala Glu Gly Asn 
100 105 no 

AAC CCC GAT CAA CAT CGT TTT ACT TAT AAG ATT GTA AAA CGA AAT AAC 
Asn Pro Asp Gin His Arg Phe Thr Tyr Lys lie Val Lys Arg Asn Asn 
US 120 125 

TAC AAA AAA GAT AAT TTA CAT CCT TAT GAG GAC GAT TAC CAT AAT CCA 
Tyr Lys Lys Asp Asn Leu His Pro Tyr Glu Asp Asp Tyr His Asn Pro 
130 135 140 

CGA TTA CAT AAA TTC GTT ACA GAA GCG GCT CCA ATT GAT ATG ACT TCG 
Arg Leu His Lys Phe Val Thr Glu Ala Ala Pro He Asp Met Thr Ser 
145 150 155 i 6 o 

AAT ATG AAT GGC AGT ACT TAT TCA GAT AGA ACA AAA TAT CCA GAA CGT 
Asn Met Asn Gly Ser Thr Tyr Ser Asp Arg Thr Lys Tyr Pro Glu Arc 
165 170 ' 175 ^ 

GTT CGT ATC GGC TCT GGA CGG CAG TTT TGG CGA AAT GAT CAA GAC AAA 
Val Arg He Gly Ser Gly Arg Gin Phe Trp Arg Asn Asp Gin Asp Lys 
180 185 190 

GGC GAC CAA GTT GCC GGT GCA TAT CAT TAT CTG ACA GCT GGC AAT ACA 
Gly Asp Gin Val Ala Gly Ala Tyr His Tyr Leu Thr Ala Gly Asn Thr 
195 200 205 

CAC AAT CAG CGT GGA GCA GGT AAT GGA TAT TCG TAT TTG GGA GGC GAT 
His Asn Gin Arg Gly Ala Gly Asn Gly Tyr Ser Tyr Leu Glv Glv Asr> 
210 215 220 

GTT CGT AAA GCG GGA GAA TAT GGT CCA TTA CCG ATT GCA GGC TCA AAG 
Val Arg Lys Ala Gly Glu Tyr Gly Pro Leu Pro He Ala Gly Ser Lvs 
225 230 2 35 240 

GGG GAC AGT GGT TCT CCG ATG TTT ATT TAT GAT GCT GAA AAA CAA AAA 827 
Gly Asp Ser Gly Ser Pro Met Phe He Tyr Asp Ala Glu Lys Gin Lys 
245 250 255 

TGG TTA ATT AAT GGG ATA TTA CGG GAA GGC AAC CCT TTT GAA GGC AAA 875 
Trp Leu He Asn Gly He Leu Arg Glu Gly Asn Pro Phe Glu Gly Lys 
260 265 270 



155 



203 



251 



299 



347 



355 



443 



491 



539 



587 



63S 



683 



731 



779 
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GAA AAT GGG TTT CAA TTG GTT CGC AAA TCT TAT TTT GAT GAA ATT TTC 923 
Glu Asn Gly Phe Gin Leu Val Arg Lys Ser Tyr Phe Asp Glu He Phe 
275 280 285 

GAA AGA GAT TTA CAT ACA TCA CTT TAC ACC CGA GCT GGT AAT GGA GTG 971 
Glu Arg Asp Leu His Thr Ser Leu Tyr Thr Arg Ala Gly Asn Gly Val 
290 295 300 

TAC ACA ATT AGT GGA AAT GAT AAT GGT CAG GGG TCT ATA ACT CAG AAA 1019 
Tyr Thr He Ser Gly Asn Asp Asn Gly Gin Gly Ser He Thr Gin Lys 
305 310 315 320 

TCA GGA ATA CCA TCA GAA ATT AAA ATT ACG TTA GCA AAT ATG AGT TTA 1067 
Ser Gly He Pro Ser Glu He Lys He Thr Leu Ala Asn Met Ser Leu 
325 330 335 

CCT TTG AAA GAG AAG GAT AAA GTT CAT AAT CCT AGA TAT GAC GGA CCT 1115 
Pro Leu Lys Glu Lys Asp Lys Val His Asn Pro Arg Tyr Asp Glv Pro 
340 345 350 

AAT ATT TAT TCT CCA CGT TTA AAC AAT GGA GAA ACG CTA TAT TTT ATG 116 3 

Asn He Tyr Ser Pro Arg Leu Asn Asn Gly Glu Thr Leu Tyr Phe Met 
355 360 365 

GAT CAA AAA CAA GGA TCA TTA ATC TTC GCA TCT GAC ATT AAC CAA GGG 1211 
Asp Gin Lys Gin Gly Ser Leu He Phe Ala Ser Asp He Asn Gin Gly 
370 375 380 

GCG GGT GGT CTT TAT TTT GAG GGT AAT TTT ACA GTA TCT CCA AAT TCT 12 59 

Ala Gly Gly Leu Tyr Phe Glu Gly Asn Phe Thr Val Ser Pro Asn Ser 
385 390 395 400 

AAC CAA ACT TGG CAA GGA GCT GGC ATA CAT GTA AGT GAA AAT AGC ACC 1307 
Asn Gin Thr Trp Gin Gly Ala Gly He His Val Ser Glu Asn Ser Thr 
405 410 415 

GTT ACT TGG AAA GTA AAT GGC GTG GAA CAT GAT CGA CTT TCT AAA ATT 13 5 5 

Val Thr Trp Lys Val Asn Gly Val Glu His Asp Arg Leu Ser Lys He 
420 425 430 

GGT AAA GGA ACA TTG CAC GTT CAA GCC AAA GGG GAA AAT AAA GGT TCG 14 03 

Gly Lys Gly Thr Leu His Val Gin Ala Lys Gly Glu Asn Lys Gly Ser 
435 440 445 

ATC AGC GTA GGC GAT GGT AAA GTC ATT TTG GAG CAG CAG GCA GAC GAT 14 51 

He Ser Val Gly Asp Gly Lys Val He Leu Glu Gin Gin Ala Asp Asp 
450 455 460 

CAA GGC AAC AAA CAA GCC TTT AGT GAA ATT GGC TTG GTT AGC GGC AGA 14 99 

Gin Gly Asn Lys Gin Ala Phe Ser Glu He Gly Leu Val Ser Glv Arq 
465 470 475 480 

GGG ACT GTT CAA TTA AAC GAT GAT AAA CAA TTT GAT ACC GAT AAA TTT 154 7 

Gly Thr Val Gin Leu Asn Asp Asp Lys Gin Phe Asp Thr Asp Lys Phe 
485 490 495 

TAT TTC GGC TTT CGT GGT GGT CGC TTA GAT CTT AAC GGG CAT TCA TTA 1595 
Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Gly His Ser Leu 
500 505 510 



ACC TTT AAA CGT ATC CAA AAT ACG GAC GAG GGG GCA ATG ATT GTG AAC 
Thr Phe Lys Arg He Gin Asn Thr Asp Glu Gly Ala Met He Val Asn 
515 520 525 



1643 



« 
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CAT AAT ACA ACT CAA GCC GCT AAT GTC ACT ATT ACT GGG AAC GAA AGC 1691 
His Asn Thr Thr Gin Ala Ala Asn Val Thr lie Thr Gly Asn Glu Ser 
530 535 540 

ATT GTT CTA CCT AAT GGA AAT AAT ATT AAT AAA CTT GAT TAC AGA AAA 1739 
lie Val Leu Pro Asn Gly Asn Asn lie Asn Lys Leu Asp Tyr Arg Lys 
545 550 555 * " 560 

GAA ATT GCC TAC AAC GGT TGG TTT GGC GAA ACA GAT AAA AAT AAA CAC 1787 
Glu lie Ala Tyr Asn Gly Trp Phe Gly Glu Thr Asp Lys Asn Lys His 
565 570 575 

AAT GGG CGA TTA AAC CTT ATT TAT AAA CCA ACC ACA GAA GAT CGT ACT 1835 
Asn Gly Arg Leu Asn Leu lie Tyr Lys Pro Thr Thr Glu Asp Arg Thr 
580 585 590 

TTG CTA CTT TCA GGT GGT ACA AAT TTA AAA GGC GAT ATT ACC CAA ACA 1883 
Leu Leu Leu Ser Gly Gly Thr Asn Leu Lys Gly Asp lie Thr Gin Thr 
595 600 ~ 605 

AAA GGT AAA CTA TTT TTC AGC GGT AGA CCG ACA CCG CAC GCC TAC AAT 1931 
Lys Gly Lys Leu Phe Phe Ser Gly Arg Pro Thr Pro His Ala Tyr Asn 
610 615 620 

CAT TTA AAT AAA CGT TGG TCA GAA ATG GAA GGT ATA CCA CAA GGC GAA 197 9 

His Leu Asn Lys Arg Trp Ser Glu Met Glu Gly lie Pro Gin Gly Glu 
625 630 635 640 

ATT GTG TGG GAT CAC GAT TGG ATC AAC CGT ACA TTT AAA GCT GAA AAC 2027 
He Val Trp Asp His Asp Trp He Asn Arg Thr Phe Lys Ala Glu Asn 
645 650 655 

TTC CAA ATT AAA GGC GGA AGT GCG GTG GTT TCT CGC AAT GTT TCT TCA 2075 
Phe Gin He Lys Gly Gly Ser Ala Val Val Ser Arg Asn Val Ser Ser 
660 665 670 

ATT GAG GGA AAT TGG ACA GTC AGC AAT AAT GCA AAT GCC ACA TTT GGT 2123 
He Glu Gly Asn Trp Thr Val Ser Asn Asn Ala Asn Ala Thr Phe Gly 
675 680 685 

GTT GTG CCA AAT CAA CAA AAT ACC ATT TGC ACG CGT TCA GAT TGG ACA 2171 
Val Val Pro Asn Gin Gin Asn Thr He Cys Thr Arg Ser Asp Trp Thr 
690 695 700 

GGA TTA ACG ACT TGT CAA AAA GTG GAT TTA ACC GAT ACA AAA GTT ATT 2219 
Gly Leu Thr Thr Cys Gin Lys Val Asp Leu Thr Asp Thr Lys Val He 
705 710 715 720 

AAT TCT ATA CCA AAA ACA CAA ATC AAT GGC TCT ATT AAT TTA ACT GAT 2267 
Asn Ser He Pro Lys Thr Gin He Asn Gly Ser He Asn Leu Thr Asp 
725 730 735 

AAT GCA ACG GCG AAT GTT AAA GGT TTA GCA AAA CTT AAT GGC AAT GTC 2315 
Asn Ala Thr Ala Asn Val Lys Gly Leu Ala Lys Leu Asn Gly Asn Val 
740 745 750 

ACT TTA ACA AAT CAC AGC CAA TTT ACA TTA AGC AAC AAT GCC ACC CAA 2363 
Thr Leu Thr Asn His Ser Gin Phe Thr Leu Ser Asn Asn Ala Thr Gin 
755 760 765 

ATA GGC AAT ATT CGA CTT TCC GAC AAT TCA ACT GCA ACG GTG GAT AAT 2411 
He Gly Asn lie Arg Leu Ser Asp Asn Ser Thr Ala Thr Val Asp Asn 
770 775 780 
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GCA AAC TTG AAC GGT AAT GTG CAT TTA ACG GAT TCA GCT CAA TTT TCT 2459 
Ala Asn Leu Asn Gly Asn Val His Leu Thr Asp Ser Ala Gin Phe Ser 
785 790 795 800 

TTA AAA AAC AGC CAT TTT TCG CAC CAA ATT CAG GGA GAC AAA GGC ACA 2507 
Leu Lys Asn Ser His Phe Ser His Gin lie Gin Gly Asp Lys Gly Thr 
805 810 815 

ACA GTG ACG TTG GAA AAT GCG ACT TGG ACA ATG CCT AGC GAT ACT ACA 2 5 55 

Thr Val Thr Leu Glu Asn Ala Thr Trp Thr Met Pro Ser Asp Thr Thr 
820 825 830 

TTG CAG AAT TTA ACG CTA AAT AAC AGT ACG ATC ACG TTA AAT TCA GCT 2603 
Leu Gin Asn Leu Thr Leu Asn Asn Ser Thr He Thr Leu Asn Ser Ala 
835 840 845 

TAT TCA GCT AGC TCA AAC AAT ACG CCA CGT CGC CGT TCA TTA GAG ACG 26 51 

Tyr Ser Ala Ser Ser Asn Asn Thr Pro Arg Arg Arg Ser Leu Glu Thr 
85C 855 860 

GAA ACA ACG CCA ACA TCG GCA GAA CAT CGT TTC AAC ACA TTG ACA GTA 2 6 9 * 

Glu Thr Thr Pro Thr Ser Ala Glu His Arg Phe Asn Thr Leu Thr Val 
865 870 875 880 

AAT GGT AAA TTG AGT GGG CAA GGC ACA TTC CAA TTT ACT TCA TCT TTA 2 74 7 

Asn Gly Lys Leu Ser Gly Gin Gly Thr Phe Gin Phe Thr Ser Ser Leu 
885 890 895 

TTT GGC TAT AAA AGC GAT AAA TTA AAA TTA TCC AAT GAC GCT GAG GGC 2795 
Phe Gly Tyr Lys Ser Asp Lys Leu Lys Leu Ser Asn Asp Ala Glu Gly 
900 90S 910 

GAT TAC ATA TTA TCT GTT CGC AAC ACA GGC AAA GAA CCC GAA ACC CTT 284 3 

Asp Tyr He Leu Ser Val Arg Asn Thr Gly Lys Glu Pro Glu Thr Leu 
915 920 925 

GAG CAA TTA ACT TTG GTT GAA AGC AAA GAT AAT CAA CCG TTA TCA GAT 2891 
Glu Gin Leu Thr Leu Val Glu Ser Lys Asp Asn Gin Pro Leu Ser Asp 
930 935 940 

AAG CTC AAA TTT ACT TTA GAA AAT GAC CAC GTT GAT GCA GGT GCA TTA 2 93 9 

Lys Leu Lys Phe Thr Leu Glu Asn Asp His Val Asp Ala Gly Ala Leu 
945 950 955 960 

CGT TAT AAA TTA GTG AAG AAT GAT GGC GAA TTC CGC TTG CAT AAC CCA 2 98 7 

Arg Tyr Lys Leu Val Lys Asn Asp Gly Glu Phe Arg Leu His Asn Pro 
965 970 975 

ATA AAA GAG CAG GAA TTG CAC AAT GAT TTA GTA AGA GCA GAG CAA GCA 3 035 

He Lys Glu Gin Glu Leu His Asn Asp Leu Val Arg Ala Glu Gin Ala 
980 985 ~ 990 

GAA CGA ACA TTA GAA GCC AAA CAA GTT GAA CCG ACT GCT AAA ACA CAA 3083 
Glu Arg Thr Leu Glu Ala Lys Gin Val Glu Pro Thr Ala Lys Thr Gin 
995 1000 1005 

ACA GGT GAG CCA AAA GTG CGG TCA AGA AGA GCA GCG AGA GCA GCG TTT 3131 
Thr Gly Glu Pro Lys Val Arg Ser Arg Arg Ala Ala Arg Ala Ala Phe 
1010 1015 ~ " 1020 

CCT GAT ACC CTG CCT GAT CAA AGC CTG TTA AAC GCA TTA GAA GCC AAA 3179 
Pro Asp Thr Leu Pro Asp Gin Ser Leu Leu Asn Ala Leu Glu Ala Lvs 
102 5 1030 1035 i u4 0 



WO 96/05858 PCT/US95/ 10661 



-54- 

CAA GCT GAA CTG ACT GCT GAA ACA CAA AAA AGT AAG GCA AAA ACA AAA 3227 
Gin Ala Glu Leu Thr Ala Glu Thr Gin Lys Ser Lys Ala Lys Thr Lys 
1045 1050 1055 

AAA GTG CGG TCA AAA AGA GCA GTG TTT TCT GAT CCC CTG CTT GAT CAA 3275 
Lys Val Arg Ser Lys Arg Ala Val Phe Ser Asp Pro Leu Leu Asp Gin 
1060 1065 1070 

AGC CTG TTC GCA TTA GAA GCC GCA CTT GAG GTT ATT GAT GCC CCA CAG 332 3 

Ser Leu Phe Ala Leu Glu Ala Ala Leu Glu Val lie Asp Ala Pro Gin 
1075 1080 1085 

CAA TCG GAA AAA GAT CGT CTA GCT CAA GAA GAA GCG GAA AAA CAA CGC 3371 
Gin Ser Glu Lys Asp Arg Leu Ala Gin Glu Glu Ala Glu Lys Gin Arg 
1090 1095 1100 

AAA CAA AAA GAC TTG ATC AGC CGT TAT TCA AAT AGT GCG TTA TCA GAA 3419 
Lys Gin Lys Asp Leu lie Ser Arg Tyr Ser Asn Ser Ala Leu Ser Glu 
1105 1110 1115 1120 

TTA TCT GCA ACA GTA AAT AGT ATG CTT TCT GTT CAA GAT GAA TTA GAT 34 6 7 

Leu Ser Ala Thr Val Asn Ser Met Leu Ser Val Gin Asp Glu Leu Asp 
1125 1130 ~ 1135 

CGT CTT TTT GTA GAT CAA GCA CAA TCT GCC GTG TGG ACA AAT ATC GCA 3515 
Arg Leu Phe Val Asp Gin Ala Gin Ser Ala Val Trp Thr Asn He Ala 
1140 1145 1150 

CAG GAT AAA AGA CGC TAT GAT TCT GAT GCG TTC CGT GCT TAT CAG CAG 3563 
Gin Asp Lys Arg Arg Tyr Asp Ser Asp Ala Phe Arg Ala Tyr Gin Gin 
1155 1160 1165 

CAG AAA ACG AAC TTA CGT CAA ATT GGG GTG CAA AAA GCC TTA GCT AAT 3611 
Gin Lys Thr Asn Leu Arg Gin He Gly Val Gin Lys Ala Leu Ala Asn 
1170 1175 1180 

GGA CGA ATT GGG GCA GTT TTC TCG CAT AGC CGT TCA GAT AAT ACC TTT 3659 
Gly Arg He Gly Ala Val Phe Ser His Ser Arg Ser Asp Asn Thr Phe 
1185 1190 1195 1200 

GAT GAA CAG GTT AAA AAT CAC GCG ACA TTA ACG ATG ATG TCG GGT TTT 3707 
Asp Glu Gin Val Lys Asn His Ala Thr Leu Thr Met Met Ser Gly Phe 
1205 1210 1215 

GCC CAA TAT CAA TGG GGC GAT TTA CAA TTT GGT GTA AAC GTG GGA ACG 3 75 5 

Ala Gin Tyr Gin Trp Gly Asp Leu Gin Phe Gly Val Asn Val Gly Thr 
1220 1225 1230 

GGA ATC AGT GCG AGT AAA ATG GCT GAA GAA CAA AGC CGA AAA ATT CAT 3803 
Gly He Ser Ala Ser Lys Met Ala Glu Glu Gin Ser Arg Lys He His 
1235 1240 1245 

CGA AAA GCG ATA AAT TAT GGC GTG AAT GCA AGT TAT CAG TTC CGT TTA 3851 
Arg Lys Ala He Asn Tyr Gly Val Asn Ala Ser Tyr Gin Phe Arg Leu 
1250 1255 1260 

GGG CAA TTG GGC ATT CAG CCT TAT TTT GGA GTT AAT CGC TAT TTT ATT 3899 
Gly Gin Leu Gly He Gin Pro Tyr Phe Gly Val Asn Arg Tyr Phe He 
1265 1270 1275 ^ 1280 

GAA CGT GAA AAT TAT CAA TCT GAG GAA GTG AGA GTG AAA ACG CCT AGC 3947 
Glu Arg Glu Asn Tyr Gin Ser Glu Glu Val Arg Val Lys Thr Pro Ser 
1285 1290 1295 
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CTT GCA TTT AAT CGC TAT AAT GCT GGC ATT CGA GTT GAT TAT ACA TTT 3995 
Leu Ala Phe Asn Arg Tyr Asn Ala Gly He Arg Val Asp Tyr Thr Phe 
1300 1305 1310 

ACT CCG ACA GAT AAT ATC AGC GTT AAG CCT TAT TTC TTC GTC AAT TAT 4 043 

Thr Pro Thr Asp Asn He Ser Val Lys Pro Tyr Phe Phe Val Asn Tyr 
1315 1320 1325 

GTT GAT GTT TCA AAC GCT AAC GTA CAA ACC ACG GTA AAT CTC ACG GTG 4 091 

Val Asp Val Ser Asn Ala Asn Val Gin Thr Thr Val Asn Leu Thr Val 
1330 1335 1340 

TTG CAA CAA CCA TTT GGA CGT TAT TGG CAA AAA GAA GTG GGA TTA AAG 4139 
Leu Gin Gin Pro Phe Gly Arg Tyr Trp Gin Lys Glu Val Gly Leu Lys 
1345 1350 1355 1360 

GCA GAA ATT TTA CAT TTC CAA ATT TCC GCT TTT ATC TCA AAA TCT CAA 4187 
Ala Glu He Leu His Phe Gin He Ser Ala Phe He Ser Lys Ser Gin 
1365 1370 1375 

GGT TCA CAA CTC GGC AAA CAG CAA AAT GTG GGC GTG AAA TTG GGC TAT 4 22 r- 

Gly Ser Gin Leu Gly Lys Gin Gin Asn Val Gly Val Lys Leu Gly Tyr 
1380 1385 * 1390 

CGT TGG TAAAAATCAA CATAATTTTA TCGTTTATTG ATAAACAAGG TGGGTCAGAT 42 91 

Arg Trp 

CAGATCCCAC CTTTTTTATT CCAATAAT 4319 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 94 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Lys Lys Thr Val Phe Arg Leu Asn Phe Leu Thr Ala Cys He Ser 
15 10 15 

Leu Gly He Val Ser Gin Ala Trp Ala Gly His Thr Tyr Phe Gly lie 
20 25 30 

Asp Tyr Gin Tyr Tyr Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Thr 
35 40 45 

Val Gly Ala Gin Asn He Lys Val Tyr Asn Lys Gin Gly Gin Leu Val 
50 55 60 

Gly Thr Ser Met Thr Lys Ala Pro Met He Asp Phe Ser Val Val Ser 
65 70 75 80 

Arg Asn Gly Val Ala Ala Leu Val Glu Asn Gin Tyr He Val Ser Val 
85 90 95 

Ala His Asn Val Gly Tyr Thr Asp Val Asp Phe Gly Ala Glu Gly Asn 
100 105 110 

Asn Pro Asp Gin His Arg Phe Thr Tyr Lys He Val Lys Arg Asn Asn 
115 120 125 



WO 96/05858 PCT/US95/10661 



-56- 

Tyr Lys Lys Asp Asn Leu His Pro Tyr Glu Asp Asp Tyr His Asn Pro 
130 135 - -- ' 



140 



Arg Leu His Lys Phe Val Thr Glu Ala Ala Pro lie Asp Met Thr Ser 
145 150 155 ico 

Asn Met Asn Gly Ser Thr Tyr Ser Asp Arg Thr Lys Tyr Pro Glu Arq 
165 170 * * 17S 

Val Arg lie Gly Ser Gly Arg Gin Phe Trp Arg Asn Asp Gin Asp Lys 
180 185 190 

Gly Asp Gin Val Ala Gly Ala Tyr His Tyr Leu Thr Ala Gly Asn Thr 
195 200 205 

His Asn Gin Arg Gly Ala Gly Asn Gly Tyr Ser Tyr Leu Gly Gly Asp 
210 215 220 

Val Arg Lys Ala Gly Glu Tyr Gly Pro Leu Pre He Ala Gly Ser Lvs 
225 230 22b 240 

Gly Asp Ser Gly Ser Pro Met Phe He Tyr Asp Ala Glu Lys Gin Lvs 
245 250 255 

Trp Leu lie Asn Gly He Leu Arg Glu Gly Asn Pro Phe Glu Gly Lvs 
260 265 270 

Glu Asn Gly Phe Gin Leu Val Arg Lys Ser Tyr Phe Asp Glu lie Phe 
275 280 285 

Glu Arg Asp Leu His Thr Ser Leu Tyr Thr Arg Ala Gly Asn Gly Val 
290 295 3oo 

Tyr Thr He Ser Gly Asn Asp Asn Gly Gin Gly Ser He Thr Gin Lys 
305 310 315 320 

Ser Gly He Pro Ser Glu He Lys He Thr Leu Ala Asn Met Ser Leu 
325 330 335 

Pro Leu Lys Glu Lys Asp Lys Val His Asn Pro Arg Tyr Asp Gly Pro 
340 345 350 

Asn He Tyr Ser Pro Arg Leu Asn Asn Gly Glu Thr Leu Tyr Phe Met 
35S 360 365 

Asp Gin Lys Gin Gly Ser Leu He Phe Ala Ser Asp He Asn Gin Glv 

370 375 



380 



Ala Gly Gly Leu Tyr Phe Glu Gly Asn Phe Thr Val Ser Pro Asn Ser 
385 390 395 400 

Asn Gin Thr Trp Gin Gly Ala Gly He His Val Ser Glu Asn Ser Thr 
405 4io 415 

Val Thr Trp Lys Val Asn Gly Val Glu His Asp Arg Leu Ser Lys He 
42 ° 425 430 

Gly Lys Gly Thr Leu His Val Gin Ala Lys Gly Glu Asn Lys Gly Ser 
435 440 " 445 

He Ser Val Gly Asp Gly Lys Val He Leu Glu Gin Gin Ala Asp Asp 

Gin Gly Asn Lys Gin Ala Phe Ser Glu He Gly Leu Val Ser Gly Aro 
465 470 475 ' 480 
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Gly Thr Val Gin Leu Asn Asp Asp Lys Gin Phe Asp Thr Asp Lys Phe 
485 490 495 

Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Gly His Ser Leu 
500 505 510 

Thr Phe Lys Arg lie Gin Asn Thr Asp Glu Gly Ala Met He Val Asn 
515 520 525 

His Asn Thr Thr Gin Ala Ala Asn Val Thr He Thr Gly Asn Glu Ser 
530 53S 540 

He Val Leu Pro Asn Gly Asn Asn He Asn Lys Leu Asp Tyr Arg Lvs 
545 550 555 560 

Glu lie Ala Tyr Asn Gly Trp Phe Gly Glu Thr Asp Lys Asn Lys His 
565 570 575 

Asn Gly Arg Leu Asn Leu He Tyr Lys Pro Thr Thr Glu Asp Arg Thr 
580 585 590 

Leu Leu Leu Ser Gly Gly Thr Asn Leu Lys Gly Asp lie Thr Gin Thr 
595 600 605 

Lys Gly Lys Leu Phe Phe Ser Gly Arg Pro Thr Pro His Ala Tyr Asn 
610 615 620 

His Leu Asn Lys Arg Trp Ser Glu Met Glu Gly He Pro Gin Gly Glu 
625 630 635 640 

lie Val Trp Asp His Asp Trp He Asn Arg Thr Phe Lys Ala Glu Asn 
645 650 655 

Phe Gin lie Lys Gly Gly Ser Ala Val Val Ser Arg Asn Val Ser Ser 
660 665 670 

lie Glu Gly Asn Trp Thr Val Ser Asn Asn Ala Asn Ala Thr Phe Gly 
675 680 685 

Val Val Pro Asn Gin Gin Asn Thr He Cys Thr Arg Ser Asp Trp Thr 
690 695 700 

Gly Leu Thr Thr Cys Gin Lys Val Asp Leu Thr Asp Thr Lys Val lie 
705 710 715 720 

Asn Ser lie Pro Lys Thr Gin He Asn Gly Ser He Asn Leu Thr Asp 
725 730 735 

Asn Ala Thr Ala Asn Val Lys Gly Leu Ala Lys Leu Asn Gly Asn Val 
740 745 750 

Thr Leu Thr Asn His Ser Gin Phe Thr Leu Ser Asn Asn Ala Thr Gin 
755 760 765 

lie Gly Asn He Arg Leu Ser Asp Asn Ser Thr Ala Thr Val Asp Asn 
770 775 780 

Ala Asn Leu Asn Gly Asn Val His Leu Thr Asp Ser Ala Gin Phe Ser 
785 790 795 800 

Leu Lys Asn Ser His Phe Ser His Gin He Gin Gly Asp Lys Gly Thr 
805 810 815 

Thr Val Thr Leu Glu Asn Ala Thr Trp Thr Met Pro Ser Asp Thr Thr 
820 82S 830 
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Leu Gin Asn Leu Thr Leu Asn Asn Ser Thr lie Thr Leu Asn Ser Ala 
835 840 845 

Tyr Ser Ala Ser Ser Asn Asn Thr Pro Arg Arg Arg Ser Leu Glu Thr 
850 855 860 

Glu Thr Thr Pro Thr Ser Ala Glu His Arg Phe Asn Thr Leu Thr Val 
865 870 875 8 80 

Asn Gly Lys Leu Ser Gly Gin Gly Thr Phe Gin Phe Thr Ser Ser Leu 
885 890 895 

Phe Gly Tyr Lys Ser Asp Lys Leu Lys Leu Ser Asn Asp Ala Glu Glv 
900 905 910 r 

Asp Tyr lie Leu Ser Val Arg Asn Thr Gly Lys Glu Pro Glu Thr Leu 
915 920 925 

Glu Gin Leu Thr Leu Val Glu Ser Lys Asp Asn Gin Pro Leu Ser Asd 
930 935 94C 

Lys Leu Lys Phe Thr Leu Glu Asn Asp His Val Asp Ala Gly Ala Leu 
945 950 955 960 

Arg Tyr Lys Leu Val Lys Asn Asp Gly Glu Phe Arg Leu His Asn Pro 
965 970 975 

lie Lys Glu Gin Glu Leu His Asn Asp Leu Val Arg Ala Glu Gin Ala 
98 0 985 990 

Glu Arg Thr Leu Glu Ala Lys Gin Val Glu Pro Thr Ala Lys Thr Gin 
995 1000 1005 

Thr ?^« G1U Pr ° LyS Val *** Ser Arg Arg Ala Ala Arg Ala Ala Phe 
1010 1015 1020 

Pro Asp Thr Leu Pro Asp Gin Ser Leu Leu Asn Ala Leu Glu Ala Lys 
1025 103 ° 1035 i$40 

Gin Ala Glu Leu Thr Ala Glu Thr Gin Lys Ser Lys Ala Lys Thr Lys 
104 5 1050 loss 

Lys Val Arg Ser Lys Arg Ala Val Phe Ser Asp Pro Leu Leu Asp Gin 
1Q 60 1065 1070 

Ser Leu Phe Ala Leu Glu Ala Ala Leu Glu Val lie Asp Ala Pro Gin 
1075 1080 1085 

Gin Ser Glu Lys Asp Arg Leu Ala Gin Glu Glu Ala Glu Lys Gin Ara 
109 ° 1095 iioo 9 

Lys Gin Lys Asp Leu He Ser Arg Tyr Ser Asn Ser Ala Leu Ser Glu 
1105 1110 1H5 1120 

Leu Ser Ala Thr Val Asn Ser Met Leu Ser Val Gin Asp Glu Leu Asd 
1125 H30 1135 * 

Arg Leu Phe Val Asp Gin Ala Gin Ser Ala Val Trp Thr Asn He Ala 
1140 H45 ii 5 o 

Gin Asp Lys Arg Arg Tyr Asp Ser Asp Ala Phe Arg Ala Tyr Gin Gin 
1155 H60 ii6s 

Gin Lys Thr Asn Leu Arg Gin He Gly Val Gin Lys Ala Leu Ala Asn 
1170 1175 1180 
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Gly Arg lie Gly Ala Val Phe Ser His Ser Arg Ser Asp Asn Thr Phe 
1185 1190 1195 1200 

Asp Glu Gin Val Lys Asn His Ala Thr Leu Thr Met Met Ser Gly Phe 
1205 1210 1215 

Ala Gin Tyr Gin Trp Gly Asp Leu Gin Phe Gly Val Asn Val Gly Thr 
1220 1225 1230 

Gly lie Ser Ala Ser Lys Met Ala Glu Glu Gin Ser Arg Lys lie His 
1235 1240 1245 

Arg Lys Ala He Asn Tyr Gly Val Asn Ala Ser Tyr Gin Phe Arg Leu 
1250 1255 1260 

Gly Gin Leu Gly He Gin Pro Tyr Phe Gly Val Asn Arg Tyr Phe He 
1265 1270 1275 1280 

Glu Arg Glu Asn Tyr Gin Ser Glu Glu Val Arg Val Lys Thr Pro Ser 
1285 1290 " ' 1255 

Leu Ala Phe Asn Arg Tyr Asn Ala Gly He Arg Val Asp Tyr Thr Phe 
1300 1305 1310 

Thr Pro Thr Asp Asn He Ser Val Lys Pro Tyr Phe Phe Val Asn Tyr 
1315 1320 1325 

Val Asp Val Ser Asn Ala Asn Val Gin Thr Thr Val Asn Leu Thr Val 
1330 1335 1340 

Leu Gin Gin Pro Phe Gly Arg Tyr Trp Gin Lys Glu Val Gly Leu Lys 
1345 1350 1355 * 1360 

Ala Glu He Leu His Phe Gin He Ser Ala Phe He Ser Lys Ser Gin 
1365 1370 1375 

Gly Ser Gin Leu Gly Lys Gin Gin Asn Val Gly Val Lys Leu Gly Tyr 
1380 1385 1390 

Arg Trp 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1541 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe lie Ala Leu Thr Val Ala 
15 10 15 

Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin lie Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Leu Val Lys Asp Lys Asn Asn Lys Asp Leu 
50 55 60 
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Gly Thr Ala Leu Pro Asn Gly lie Pro Met lie Asp Phe Ser Val Val 
65 70 75 " 80 

Asp Val Asp Lys Arg He Ala Thr Leu He Asn Pro Gin Tyr Val Val 
85 90 



95 



Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
"0 105 no 

Leu Asn Gly Asn Met Asn Asn Gly Asn Ala Lys Ala His Arg Asp Val 

120 125 

Ser Ser Glu Glu Asn Arg Tyr Phe Ser Val Glu Lys Asn Glu Tyr Pro 
1 30 135 140 

Thr Lys Leu Asn Gly Lys Thr Val Thr Thr Glu Asp Gin Thr Gin Lvs 
145 150 155 160 

Arg Arg Glu Asp Tyr Tyr Mec Pro Arg Leu Asp Lvs Phe Val Thr Glu 
165 170 175 

Val Ala Pro He Glu Ala Ser Thr Ala Ser Ser Asp Ala Gly Thr Tvr 
180 i 8 5 190 

Asn Asp Gin Asn Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Ser 
195 200 ~ 205 

Gin Phe He Tyr Lys Lys Gly Asp Asn Tyr Ser Leu He Leu Asn Asn 
210 215 220 

His Glu Val Gly Gly Asn Asn Leu Lys Leu Val Gly Asp Ala Tyr Thr 
225 230 235 ' 240 

Tyr Gly He Ala Gly Thr Pro Tyr Lys Val Asn His Glu Asn Asn Gly 
245 250 255 

Leu He Gly Phe Gly Asn Ser Lys Glu Glu His Ser Asp Pro Lys Glv 
260 265 270 

He Leu Ser Gin Asp Pro Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser 
275 280 285 

Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe 
290 295 300 

Leu Gly Ser Tyr Asp Phe Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin 
305 310 315 320 

Glu Trp Asn He Tyr Lys Ser Gin Phe Thr Lys Asp Val Leu Asn Lys 
325 330 335 

Asp Ser Ala Gly Ser Leu He Gly Ser Lys Thr Asp Tyr Ser Trp Ser 
340 345 



350 



Ser Asn Gly Lys Thr Ser Thr He Thr Gly Gly Glu Lys Ser Leu Asn 
3 55 360 365 

Val Asp Leu Ala Asp Gly Lys Asp Lys Pro Asn His Gly Lys Ser Val 
37 0 375 380 

Thr Phe Glu Gly Ser Gly Thr Leu Thr Leu Asn Asn Asn He Asp Gin 
385 390 395 400 

Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr Glu Val Lys Gly Thr 
405 410 415 
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Ser Asp Asn Thr Thr Trp Lys Gly Ala Gly Val Ser Val Ala Glu Gly 
420 425 430 

Lys Thr Val Thr Trp Lys Val His Asn Pro Gin Tyr Asp Arg Leu Ala 
435 440 445 

Lys lie Gly Lys Gly Thr Leu He Val Glu Gly Thr Gly Asp Asn Lys 
450 455 460 

Gly Ser Leu Lys Val Gly Asp Gly Thr Val He Leu Lys Gin Gin Thr 
465 . 470 475 * 480 

Asn Gly Ser Gly Gin His Ala Phe Ala Ser Val Gly He Val Ser Gly 
485 490 495 

Arg Ser Thr Leu Val Leu Asn Asp Asp Lys Gin Val Asp Pro Asn Ser 
500 505 510 

He Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Glv Asn Ser 

515 520 sib 

Leu Thr Phe Asp His He Arg Asn He Asp Asp Gly Ala Arg Leu Val 
530 535 540 

Asn His Asn Met Thr Asn Ala Ser Asn He Thr He Thr Gly Glu Ser 
545 550 555 560 

Leu He Thr Asp Pro Asn Thr He Thr Pro Tyr Asn He Asp Ala Pro 
565 570 575 

Asp Glu Asp Asn Pro Tyr Ala Phe Arg Arg He Lys Asp Gly Gly Gin 
580 585 ~ 590 

Leu Tyr Leu Asn Leu Glu Asn Tyr Thr Tyr Tyr Ala Leu Arg Lys Gly 
595 600 605 

Ala Ser Thr Arg Ser Glu Leu Pro Lys Asn Ser Gly Glu Ser Asn Glu 
610 615 620 

Asn Trp Leu Tyr Met Gly Lys Thr Ser Asp Glu Ala Lys Arg Asn Val 
625 630 635 6 4o 

Met Asn His He Asn Asn Glu Arg Met Asn Gly Phe Asn Gly Tyr Phe 
645 650 655 

Gly Glu Glu Glu Gly Lys Asn Asn Gly Asn Leu Asn Val Thr Phe Lys 
660 665 670 

Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu Thr Gly Gly Thr Asn Leu 
675 680 685 

Asn Gly Asp Leu Thr Val Glu Lys Gly Thr Leu Phe Leu Ser Glv Ara 
690 695 700 . 

Pro Thr Pro His Ala Arg Asp He Ala Gly He Ser Ser Thr Lys Lys 
705 710 715 ^ 720 

Asp Pro His Phe Ala Glu Asn Asn Glu Val Val Val Glu Asp Asp Trp 
725 730 735 

He Asn Arg Asn Phe Lys Ala Thr Thr Met Asn Val Thr Gly Asn Ala 
740 745 750 

Ser Leu Tyr Ser Gly Arg Asn Val Ala Asn He Thr Ser Asn He Thr 
755 760 765 
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Ala Ser Asn Lys Ala Gin Val His He Oly Tyr Lys Thr Gly Asp Thr 
770 775 780 

Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr Val Thr Cys Thr Thr Asp 

795 800 

Lys Leu Ser Asp Lys Ala Leu Asn Ser Phe Asn Pro Thr Asn Leu Arc, 
80S 810 815 a 

Gly Asn Val Asn Leu Thr Glu Ser Ala Asn Phe Val Leu Gly Lys Ala 
820 825 830 

Asn Leu Phe Gly Thr He Gin Ser Arg Gly Asn Ser Gin Val Arg Leu 
*»■»-' 840 845 

Thr Glu Asn Ser His Trp His Leu Thr Gly Asn Ser Asp Val His Gin 

8 55 e6 0 

Leu Asp Leu Ala Asn Gly His He His Leu Asn Ser Ala Asp Asn Ser 

875 eeo 
Asn Asn Val Thr Lys Tyr Asn Thr Leu Thr Val Asn Ser Leu ser Glv 
885 890 895 

Asn Gly Ser Phe Tyr Tyr Leu Thr Asp Leu Ser Asn Lys Gin Gly Asp 



910 



Lys Val Val Val Thr Lys Ser Ala Thr Gly Asn Phe Thr Leu Gin Val 

920 925 

Ala Asp Lys Thr Gly Glu Pro Asn His Asn Glu Leu Thr Leu Phe Asp 

935 940 

Ala Ser Lys Ala Gin Arg Asp His Leu Asn Val Ser Leu Val Gly Asn 

950 955 7 960 

Thr Val Asp Leu Gly Ala Trp Lys Tyr Lys Leu Arg Asn Val Asn Gly 
965 970 975 

Arg Tyr Asp Leu Tyr Asn Pro Glu Val Glu Lys Arg Asn Gin Thr Val 

985 

Asp Thr Thr Asn lie Thr Thr Pro Asn Asn lie Gin Ala Asp Val Pro 

1000 1005 

Ser Val Pro Ser Asn Asn Glu Glu He Ala Arg Val Asp Glu Ala Pro 
xuxu 1015 1020 

Val Pro Pro Pro Ala Pro Ala Thr Pro Ser Glu Thr Thr Glu Thr Val 

1030 1035 1040 

Ala Glu Asn Ser Lys Gin Glu Ser Lys Thr Val Glu Lys Asn Glu Gin 
1045 1050 1055 

Asp Ala Thr Glu Thr Thr Ala Gin Asn Arg Glu Val Ala Lys Glu Ala 

1065 1070 
Lys ser Asn^Val Lys Ala Asn ThrGln Thr Asn Glu Val Ala Gin Ser 

ltlc G1U ^ ^ G1U lols Gln ^ Thr ° 1U Glu Thr Ala 



1100 

Sos^ 1 G1U LyS G1U G iYo LyS Ala LyS Val Glu Thr Glu Thr Gin 

1115 1120 



WO 96/05858 PCT/U S95V1 066 1 



-63- 

Glu Val Pro Lys Val Thr Ser Gin Val Ser Pro Lys Gin Glu Gin Ser 
1125 H30 H35 

Glu Thr Val Gin Pro Gin Ala Glu Pro Ala Arg Glu Asn Asp Pro Thr 
1140 1145 iiso 

Val Asn lie Lys Glu Pro Gin Ser Gin Thr Asn Thr Thr Ala Asp Thr 
"SB 1160 H65 

Glu Gin Pro Ala Lys Glu Thr Ser Ser Asn Val Glu Gin Pro Val Thr 
1170 H75 iieo 

Glu Ser Thr Thr Val Asn Thr Gly Asn Ser Val Val Glu Asn Pro Glu 
1185 H90 H95 1200 

Asn Thr Thr Pro Ala Thr Thr Gin Pro Thr Val Asn Ser Glu Ser Ser 
1205 1210 1215 

Asn Lys Pro Lys Asn Arg His Arg Arg Ser Val Arg Ser Val Pro His 
1220 1225 1220 

Asn Val Glu Pro Ala Thr Thr Ser Ser Asn Asp Arg Ser Thr Val Ala 
1235 1240 1245 

Leu Cys Asp Leu Thr Ser Thr Asn Thr Asn Ala Val Leu Ser Asp Ala 
1250 1255 1260 

Arg Ala Lys Ala Gin Phe Val Ala Leu Asn Val Gly Lys Ala Val Ser 
1265 1270 1275 1280 

Gin His lie Ser Gin Leu Glu Met Asn Asn Glu Gly Gin Tyr Asn Val 
1285 1290 1295 

Trp Val Ser Asn Thr Ser Met Asn Lys Asn Tyr Ser Ser Ser Gin Tyr 
1300 1305 1310 

Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu Gly Trp Asp Gin 
1315 1320 1325 

Thr lie Ser Asn Asn Val Gin Leu Gly Gly Val Phe Thr Tyr Val Arq 
1330 1335 1340 

Asn Ser Asn Asn Phe Asp Lys Ala Thr Ser Lys Asn Thr Leu Ala Gin 
"45 1350 1355 1360 

Val Asn Phe Tyr Ser Lys Tyr Tyr Ala Asp Asn His Trp Tyr Leu Glv 
1365 1370 1375 

He Asp Leu Gly Tyr Gly Lys Phe Gin Ser Lys Leu Gin Thr Asn His 
1380 1385 1390 

Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly Leu Thr Ala Gly 
1395 1400 1405 

Lys Ala Phe Asn Leu Gly Asn Phe Gly He Thr Pro He Val Gly Val 
1410 1415 1420 

Arg Tyr Ser Tyr Leu Ser Asn Ala Asp Phe Ala Leu Asp Gin Ala Arg 
1425 1430 1435 ~ 24 40 

He Lys Val Asn Pro He Ser Val Lys Thr Ala Phe Ala Gin Val Asp 
1445 1450 1455 

Leu Ser Tyr Thr Tyr His Leu Gly Glu Phe Ser Val Thr Pro He Leu 
1460 1465 1470 



WO 96/05858 PCTAJS95/10661 



-64- 

Ser Ala Arg Tyr Asp Ala Asn Gin Gly Ser Gly Lys lie Asn Val Asn 
1475 1480 1485 

Gly Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin Gin Tyr Asn Ala 
1490 1495 1500 

Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu He Gly Gly 
1505 1510 1515 1520 

Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr Ala Glu Leu Lys 
1525 1530 1535 

Leu Ser Phe Ser Phe 
1540 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1545 amino acids 
(E) TYPE: amino acid 
( D ) TOPOLOGY : unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 
15 10 15 

Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 

20 25 30 

Asp Tyr Gin He Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Asn Asn Arg Pro Leu 
50 55 60 

Gly Asn Val Leu Pro Asn Gly He Pro Met He Asp Phe Ser Val Val 
65 70 75 ~ 80 

Asp Val Asp Lys Arg He Ala Thr Leu Val Asn Pro Gin Tyr Val Val 
85 90 95 

Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
100 105 HO 

Leu Asn Gly Asn Met Asn Asn Gly Asn Ala Lys Ala His Arg Asp Val 
115 120 125 

Ser Ser Glu Glu Asn Arg Tyr Tyr Thr Val Glu Lys Asn Glu Tyr Pro 
130 135 140 

Thr Lys Leu Asn Gly Lys Ala Val Thr Thr Glu Asp Gin Ala Gin Lys 
145 150 155 160 

Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 
165 170 175 

Val Ala Pro He Glu Ala Ser Thr Asp Ser Ser Thr Ala Gly Thr Tyr 
180 185 190 

Asn Asn Lys Asp Lys Tyr Pro Tyr Phe Val Arg Leu Gly Ser Gly Thr 
195 200 205 
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Gln Phe lie Tyr Glu Asn Gly Thr Arg Tyr Glu Leu Trp Leu Gly Lys 
210 215 220 

Glu Gly Gin Lys Ser Asp Ala Gly Gly Tyr Asn Leu Lys Leu Val Gly 
225 230 235 240 

Asn Ala Tyr Thr Tyr Gly lie Ala Gly Thr Pro Tyr Glu Val Asn His 
245 2S0 255 

Glu Asn Asp Gly Leu lie Gly Phe Gly Asn Ser Asn Asn Glu Tyr lie 
260 265 270 

Asn Pro Lys Glu lie Leu Ser Lys Lys Pro Leu Thr Asn Tyr Ala Val 
275 260 285 

Leu Gly Asp Ser Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Gly 
290 295 300 

Lys Trp Leu Phe Leu Gly Ser Tyr Asp Tyr Trp Ala Glv Tvr Asn Lys 

310 315 320 

Lys Ser Trp Gin Glu Trp Asn lie Tyr Lys Pro Glu Phe Ala Glu Lvs 
32 5 330 335 " 

He Tyr Glu Gin Tyr Ser Ala Gly Ser Leu lie Gly Ser Lys Thr Asp 
340 345 3 5 o 

Tyr Ser Trp Ser Ser Asn Gly Lys Thr Ser Thr He Thr Gly Gly Glu 
355 360 365 

Lys Ser Leu Asn Val Asp Leu Ala Asp Gly Lys Asp Lys Pro Asn His 
370 375 380 

Gly Lys Ser Val Thr Phe Glu Gly Ser Gly Thr Leu Thr Leu Asn Asn 
385 390 395 400 

Asn He Asp Gin Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr Glu 
405 410 " 4i 5 

Val Lys Gly Thr Ser Asp Asn Thr Thr Trp Lys Gly Ala Gly Val Ser 
420 425 430 

Val Ala Glu Gly Lys Thr Val Thr Trp Lys Val His Asn Pro Gin Tvr 
455 440 445 7 

Asp Arg Leu Ala Lys He Gly Lys Gly Thr Leu He Val Glu Gly Thr 
450 455 460 

Gly Asp Asn Lys Gly Ser Leu Lys Val Gly Asp Gly Thr Val He Leu 
465 470 475 480 

Lys Gin Gin Thr Asn Gly Ser Gly Gin His Ala Phe Ala Ser Val Gly 
485 490 495 

He Val Ser Gly Arg Ser Thr Leu Val Leu Asn Asp Asp Lys Gin Val 
500 505 510 

Asp Pro Asn Ser He Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu 
515 S20 525 

Asn Gly Asn Ser Leu Thr Phe Asp His lie Arg Asn lie Asp Glu Gly 
530 535 540 

Ala Arg Leu Val Asn His Ser Thr Ser Lys His Ser Thr Val Thr He 
545 sso S5S 560 
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Thr Gly Asp Asn Leu lie Thr Asp Pro Asn Asn Val Ser lie Tyr Tyr 
565 570 575 

Val Lys Pro Leu Glu Asp Asp Asn Pro Tyr Ala He Arg Gin He Lys 
580 585 590 

Tyr Gly Tyr Gin Leu Tyr Phe Asn Glu Glu Asn Arg Thr Tyr Tyr Ala 
595 600 605 

Leu Lys Lys Asp Ala Ser He Arg Ser Glu Phe Pro Gin Asn Arg Gly 
610 615 620 

Glu Ser Asn Asn Ser Trp Leu Tyr Met Gly Thr Glu Lys Ala Asp Ala 
625 630 635 * 640 

Gin Lys Asn Ala Met Asn His He Asn Asn Glu Arg Met Asn Gly Phe 
645 650 ~ 655 

Asn Gly Tyr Phe Gly Glu Glu Glu Gly Lys Asn Asn Glv Asn Leu Asn 
660 665 * 670 

Val Thr Phe Lys Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu Thr Gly 
675 680 685 

Gly Thr Asn Leu Asn Gly Asp Leu Asn Val Gin Gin Gly Thr Leu Phe 
690 695 700 

Leu Ser Gly Arg Pro Thr Pro His Ala Arg Asp lie Ala Gly He Ser 
705 710 715 720 

Ser Thr Lys Lys Asp Ser His Phe Ser Glu Asn Asn Glu Val Val Val 
725 730 735 

Glu Asp Asp Trp He Asn Arg Asn Phe Lys Ala Thr Asn He Asn Val 
740 745 750 

Thr Asn Asn Ala Thr Leu Tyr Ser Gly Arg Asn Val Glu Ser He Thr 
755 760 765 

Ser Asn He Thr Ala Ser Asn Asn Ala Lys Val His He Gly Tyr Lys 
770 775 780 

Ala Gly Asp Thr Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr Val Thr 
785 790 795 800 

Cys Thr Thr Asp Lys Leu Ser Asp Lys Ala Leu Asn Ser Phe Asn Pro 
805 810 815 

Thr Asn Leu Arg Gly Asn Val Asn Leu Thr Glu Ser Ala Asn Phe Val 
820 825 830 

Leu Gly Lys Ala Asn Leu Phe Gly Thr He Gin Ser Arg Gly Asn Ser 
835 840 845 

Gin Val Arg Leu Thr Glu Asn Ser His Trp His Leu Thr Gly Asn Ser 
850 855 860 

Asp Val His Gin Leu Asp Leu Ala Asn Gly His He His Leu Asn Ser 
865 870 875 880 

Ala Asp Asn Ser Asn Asn Val Thr Lys Tyr Asn Thr Leu Thr Val Asn 
885 890 895 

Ser Leu Ser Gly Asn Gly Ser Phe Tyr Tyr Leu Thr Asp Leu Ser Asn 
900 905 910 
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Lys Gin Gly Asp Lys Val Val Val Thr Lys Ser Ala Thr Gly Asn Phe 
915 920 925 

Thr Leu Gin Val Ala Asp Lys Thr Gly Glu Pro Asn His Asn Glu Leu 
930 935 940 

Thr Leu Phe Asp Ala Ser Lys Ala Gin Arg Asp His Leu Asn Val Ser 
945 950 955 960 

Leu Val Gly Asn Thr Val Asp Leu Gly Ala Trp Lys Tyr Lys Leu Arg 
965 970 975 

Asn Val Asn Gly Arg Tyr Asp Leu Tyr Asn Pro Glu Val Glu Lys Arg 
980 985 990 

Asn Gin Thr Val Asp Thr Thr Asn He Thr Thr Pro Asn Asn He Gin 
995 1000 1005 

Ala Asp Val Pro Ser Val Pro Ser Asn Asn Glu Glu He Ala Ara Val 
1010 1015 iorc 

Asp Glu Ala Pro Val Pro Pro Pro Ala Pro Ala Thr Pro Ser Glu Thr 
1025 1030 1035 1040 

Thr Glu Thr Val Ala Glu Asn Ser Lys Gin Glu Ser Lys Thr Val Glu 
1045 1050 ' 1055 

Lys Asn Glu Gin Asp Ala Thr Glu Thr Thr Ala Gin Asn Arg Glu Val 
1060 1065 1070 

Ala Lys Glu Ala Lys Ser Asn Val Lys Ala Asn Thr Gin Thr Asn Glu 
1075 1080 1085 

Val Ala Gin Ser Gly Ser Glu Thr Lys Glu Thr Gin Thr Thr Glu Thr 
1090 1095 lioo 

Lys Glu Thr Ala Thr Val Glu Lys Glu Glu Lys Ala Lys Val Glu Thr 
1105 mo ins 1120 

Glu Lys Thr Gin Glu Val Pro Lys Val Thr Ser Gin Val Ser Pro Lys 
1125 H30 H35 

Gin Glu Gin Ser Glu Thr Val Gin Pro Gin Ala Glu Pro Ala Arg Glu 
1140 H45 H50 

Asn Asp Pro Thr Val Asn He Lys Glu Pro Gin Ser Gin Thr Asn Thr 
1155 1160 H65 

Thr Ala Asp Thr Glu Gin Pro Ala Lys Glu Thr Ser Ser Asn Val Glu 
H70 1175 H80 

Gin Pro Val Thr Glu Ser Thr Thr Val Asn Thr Gly Asn Ser Val Val 
H85 1190 H95 1200 

Glu Asn Pro Glu Asn Thr Thr Pro Ala Thr Thr Gin Pro Thr Val Asn 
1205 1210 1215 

Ser Glu Ser Ser Asn Lys Pro Lys Asn Arg His Arg Arg Ser Val Arg 
1220 1225 1230 

Ser Val Pro His Asn Val Glu Pro Ala Thr Thr Ser Ser Asn Asp Arg 
1235 1240 1245 

Ser Thr Val Ala Leu Cys Asp Leu Thr Ser Thr Asn Thr Asn Ala Val 
1250 1255 1260 
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Leu Ser Asp Ala Arg Ala Lys Ala Gin Phe Val Ala Leu Asn Val Glv 
1265 1270 12 75 12 80 

Lys Ala Val Ser Gin His He Ser Gin Leu Glu Met Asn Asn Glu Glv 
1285 1290 1295 

Gin Tyr Asn Val Trp Val Ser Asn Thr Ser Met Asn Lys Asn Tyr Ser 
1300 1305 1310 

Ser Ser Gin Tyr Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu 
13 *5 1320 1325 

Gly Trp Asp Gin Thr He Ser Asn Asn Val Gin Leu Gly Gly Val Phe 
13 30 1335 i3 4 o 

Thr Tyr Val Arg Asn Ser Asn Asn Phe Asp Lys Ala Thr Ser Lys Asn 
1345 13S0 1355 136O 

Thr Leu Ala Gin Val Asn Phe Tyr Ser Lys Tyr Tvr Ala Asd Asn His 
1365 137C * 137 c : 

Trp Tyr Leu Gly He Asp Leu Gly Tyr Gly Lys Phe Gin Ser Lys Leu 
^380 1385 1390 

Gin Thr Asn His Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly 
1395 1400 1405 

Leu Thr Ala Gly Lys Ala Phe Asn Leu Gly Asn Phe Gly He Thr Pro 
1410 1415 1420 

^oc Val Gly Val ^ 9 Tyr Ser ^ ^ u Ser Asn Ala Asp Phe Ala Leu 
1425 143 0 1435 1440 

Asp Gin Ala Arg He Lys Val Asn Pro He Ser Val Lys Thr Ala Phe 
14 45 1450 ^ 1455 

Ala Gin Val Asp Leu Ser Tyr Thr Tyr His Leu Gly Glu Phe Ser Val 
1460 1465 1470 

Thr Pro He Leu Ser Ala Arg Tyr Asp Ala Asn Gin Gly Ser Gly Lvs 
1475 1480 i4 85 7 1 

He Asn Val Asn Gly Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin 
1490 1495 isoo 

Gin Tyr Asn Ala Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser 
1505 1510 1515 1520 

Leu He Gly Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr 
1525 1530 1535 

Ala Glu Leu Lys Leu Ser Phe Ser Phe 
1540 1545 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1702 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 
15 10 is 
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Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin lie Phe Arg Asp Phe Ala Glu Asn Lys Gly Arg Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Asn Asn His Ser Leu 
50 55 60 

Gly Asn Val Leu Pro Asn Gly lie Pro Met He Asp Phe Ser Val Val 
65 70 75 80 

Asp Val Asp Lys Arg He Ala Thr Leu He Asn Pro Gin Tyr Val Val 
85 90 95 

Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
100 105 no 

Leu Asn Gly Asn Met Asn Asn Glv Asn Asp Lvs Ser His Arq Asp Val 
115 120 12b 

Ser Ser Glu Glu Asn Arg Tyr Phe Ser Val Glu Lys Asn Glu Tyr Pro 
130 135 140 

Thr Lys Leu Asn Gly Lys Ala Val Thr Thr Glu Asp Gin Thr Gin Lys 
145 150 155 160 

Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 
165 170 175 

Val Ala Pro He Glu Ala Ser Thr Ala Ser Ser Asp Ala Gly Thr Tyr 
180 185 190 

Asn Asp Gin Asn Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Thr 
195 200 205 

Gin Phe He Tyr Lys Lys Gly Asp Asn Tyr Ser Leu He Leu Asn Asn 
210 215 220 

His Glu Val Gly Gly Asn Asn Leu Lys Leu Val Gly Asp Ala Tyr Thr 
225 230 235 * 240 

Tyr Gly He Ala Gly Thr Pro Tyr Lys Val Asn His Glu Asn Asn Gly 
245 250 255 

Leu He Gly Phe Gly Asn Ser Lys Glu Glu His Ser Asp Pro Lys Gly 
260 265 270 

He Leu Ser Gin Asp Pro Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser 
275 280 285 

Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe 
290 295 300 

Leu Gly Ser Tyr Asp Phe Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin 
305 310 315 320 

Glu Trp Asn He Tyr Lys Pro Glu Phe Ala Lys Thr Val Leu Asp Lys 
325 330 335 

Asp Thr Ala Gly Ser Leu He Gly Ser Asn Thr Gin Tyr Asn Trp Asn 
340 345 350 

Pro Thr Gly Lys Thr Ser Val He Ser Asn Gly Ser Glu Ser Leu Asn 
355 360 365 
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Val Asp Leu Phe Asp Ser Ser Gin Asp Thr Asp Ser Lys Lys Asn Asn 
370 375 380 

His Gly Lys Ser Val Thr Leu Arg Gly Ser Gly Thr Leu Thr Leu Asn 
385 390 395 400 

Asn Asn He Asp Gin Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr 
405 410 41 5 

Glu Val Lys Gly Thr Ser Asp Ser Thr Thr Trp Lys Gly Ala Gly Val 
420 425 430 

Ser Val Ala Asp Gly Lys Thr Val Thr Trp Lys Val His Asn Pro Lys 
435 440 445 1 

Ser Asp Arg Leu Ala Lys He Gly Lys Gly Thr Leu He Val Glu Glv 
450 455 46O 

Lys Gly Glu Asn Lys Gly Ser Leu Lvs Val Glv Asd Gly Thr Val He 
465 4 70 4 75 48C 

Leu Lys Gin Gin Ala Asp Ala Asn Asn Lys Val Lys Ala Phe Ser Gin 
4 85 490 495 

Val Gly He Val Ser Gly Arg Ser Thr Val Val Leu Asn Asp Asp Lys 
500 505 510 

Gin Val Asp Pro Asn Ser He Tyr Phe Gly Phe Arg Gly Gly Arg Leu 
515 520 525 

Asp Ala Asn Gly Asn Asn Leu Thr Phe Glu. His He Arg Asn He Asp 
530 535 540 

Asp Gly Ala Arg Leu Val Asn His Asn Thr Ser Lys Thr Ser Thr Val 
545 550 555 560 

Thr lie Thr Gly Glu Ser Leu He Thr Asp Pro Asn Thr He Thr Pro 
565 570 575 

Tyr Asn He Asp Ala Pro Asp Glu Asp Asn Pro Tyr Ala Phe Arq Ara 
580 585 590 

He Lys Asp Gly Gly Gin Leu Tyr Leu Asn Leu Glu Asn Tyr Thr Tvr 
595 600 605 

Tyr Ala Leu Arg Lys Gly Ala Ser Thr Arg Ser Glu Leu Pro Lys Asn 
€1 ° 615 620 

Ser Gly Glu Ser Asn Glu Asn Trp Leu Tyr Met Gly Lys Thr Ser Asp 
625 630 635 * 6 4 0 

Ala Ala Lys Arg Asn Val Met Asn His He Asn Asn Glu Arg Met Asn 
645 650 655 

Gly Phe Asn Gly Tyr Phe Gly Glu Glu Glu Gly Lys Asn Asn Gly Asn 
660 665 * 



670 



Leu Asn Val Thr Phe Lys Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu 
675 680 685 

Thr Gly Gly Thr Asn Leu Asn Gly Asp Leu Lys Val Glu Lys Glv Thr 
690 695 700 

Leu Phe Leu Ser Gly Arg Pro Thr Pro His Ala Arg Asp He Ala Glv 
705 710 715 * 720 
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Ile Ser Ser Thr Lys Lys Asp Gin His Phe Ala Glu Asn Asn Glu Val 
725 730 735 

Val Val Glu Asp Asp Trp He Asn Arg Asn Phe Lys Ala Thr Asn He 
740 745 750 

Asn Val Thr Asn Asn Ala Thr Leu Tyr Ser Gly Arg Asn Val Ala Asn 
755 760 765 

He Thr Ser Asn He Thr Ala Ser Asp Asn Ala Lys Val His He Gly 
770 775 780 

Tyr Lys Ala Gly Asp Thr Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr 
785 790 795 800 

Val Thr Cys Thr Thr Asp Lys Leu Ser Asp Lys Ala Leu Asn Ser Phe 
805 810 815 

Asn Ala Thr Asn Val Ser Gly Asn Val Asn Leu Ser Glv Asn Ala Asn 
820 625 * 830 

Phe Val Leu Gly Lys Ala Asn Leu Phe Gly Thr He Ser Gly Thr Gly 
835 840 845 

Asn Ser Gin Val Arg Leu Thr Glu Asn Ser His Trp His Leu Thr Gly 
850 855 860 

Asp Ser Asn Val Asn Gin Leu Asn Leu Asp Lys Gly His He His Leu 
865 870 875 * 880 

Asn Ala Gin Asn Asp Ala Asn Lys Val Thr Thr Tyr Asn Thr Leu Thr 
885 890 895 

Val Asn Ser Leu Ser Gly Asn Gly Ser Phe Tyr Tyr Leu Thr Asp Leu 
900 905 910 

Ser Asn Lys Gin Gly Asp Lys Val Val Val Thr Lys Ser Ala Thr Gly 
915 920 925 

Asn Phe Thr Leu Gin Val Ala Asp Lys Thr Gly Glu Pro Thr Lys Asn 
930 935 940 

Glu Leu Thr Leu Phe Asp Ala Ser Asn Ala Thr Arg Asn Asn Leu Asn 
945 950 955 "* 960 

Val Ser Leu Val Gly Asn Thr Val Asp Leu Gly Ala Trp Lys Tyr Lys 
965 970 975 

Leu Arg Asn Val Asn Gly Arg Tyr Asp Leu Tyr Asn Pro Glu Val Glu 
980 985 990 

Lys Arg Asn Gin Thr Val Asp Thr Thr Asn He Thr Thr Pro Asn Asn 
995 1000 1005 

He Gin Ala Asp Val Pro Ser Val Pro Ser Asn Asn Glu Glu He Ala 
1010 1015 1020 

Arg Val Glu Thr Pro Val Pro Pro Pro Ala Pro Ala Thr Pro Ser Glu 
1025 1030 1035 1040 

Thr Thr Glu Thr Val Ala Glu Asn Ser Lys Gin Glu Ser Lys Thr Val 
1045 1050 1055 

Glu Lys Asn Glu Gin Asp Ala Thr Glu Thr Thr Ala Gin Asn Gly Glu 
1060 1065 1070 
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val Ala Glu Glu Ala Lys Pro Ser Val Lys Ala Asn Thr Gin Thr Asn 
1075 1080 1085 

G1U Y«L Ala Gln Ser Gly Ser Glu Thr Glu Glu T hr Gin Thr Thr Glu 
1°9° 1095 iioo 

^Ac LyS Glu Thr Ala Lys Val Glu Glu Glu L y s Ala Lys Val Glu 

1105 1110 HIS 1120 

Lys Glu Glu Lys Ala Lys Val Glu Lys Asp Glu lie Gin Glu Ala Pro 
1125 H30 ii3 5 

Gin Met Ala Ser Glu Thr Ser Pro Lys Gin Ala Lys Pro Ala Pro Lys 
H40 1150 

Glu Val Ser Thr Asp Thr Lys Val Glu Glu Thr Gin Val Gin Ala Gin 
1155 H60 ii6s 

Pr ° Thr Gln Ser Thr Thr Val Ala Ala Ala Glu Ala Thr Ser Pro 

H70 1175 H80 

Asm Ser Lys Pro Ala Glu Glu Thr Gln Pro Ser Glu Lys Thr Asn Ala 
1185 1190 H95 120 0 

Glu Pro Val Thr Pro Val Val Ser Lys Asn Gln Thr Glu Asn Thr Thr 
I 205 1210 1215 

Asp Gln Pro Thr Glu Arg Glu Lys Thr Ala Lys Val Glu Thr Glu Lys 
1220 1225 1230 

Thr Gln Glu Pro Pro Gln Val Ala Ser Gln Ala Ser Pro Lys Gln Glu 
1235 1240 1245 

Gln ? e !L G1U Thr Val Gln Pro Gln Ala Val G *u Ser Glu Asn Val 

1250 1255 1260 

Pro Thr Val Asn Asn Ala Glu Glu Val Gln Ala Gln Leu Gln Thr Gln 
1265 1270 1275 1280 

Thr Ser Ala Thr Val Ser Thr Lys Gln Pro Ala Pro Glu Asn Ser lie 
1285 1290 1295 

Asn Thr Gly Ser Ala Thr Ala lie Thr Glu Thr Ala Glu Lys Ser Aso 
1300 1305 13 io * 

Lys Pro Gln Thr Glu Thr Ala Ala Ser Thr Glu Asp Ala Ser Gln His 
1315 1320 1325 

Lys Ala Asn Thr Val Ala Asp Asn Ser Val Ala Asn Asn Ser Glu Ser 
- L ->- J0 1335 1340 

Ser Glu Pro Lys Ser Arg Arg Arg Arg Ser He Ser Gln Pro Gln Glu 
1345 1350 1355 1360 

Thr Ser Ala Glu Glu Thr Thr Ala Ala Ser Thr Asp Glu Thr Thr He 
1365 1370 i 375 

Ala Asp Asn Ser Lys Arg Ser Lys Pro Asn Arg Arg Ser Arg Arg Ser 
1380 1385 i3 90 

Val Arg Ser Glu Pro Thr Val Thr Asn Gly Ser Asp Arg Ser Thr Val 
13« 1400 



Ala Leu Arg Asp Leu Thr Ser Thr Asn Thr Asn Ala Val He Ser Asp 

14XO 1415 * 



1405 

1420 
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Ala Met Ala Lys Ala Gin Phe Val Ala Leu Asn Val Gly Lys Ala Val 
1425 1430 1435 1440 

Ser Gin His lie Ser Gin Leu Glu Met Asn Asn Glu Gly Gin Tyr Asn 
1445 1450 1455 

Val Trp Val Ser Asn Thr Ser Met Asn Glu Asn Tyr Ser Ser Ser Gin 
1460 1465 1470 

Tyr Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu Gly Trp Asp 
1475 1480 1485 

Gin Thr lie Ser Asn Asn Val Gin Leu Gly Gly Val Phe Thr Tyr Val 
1490 1495 1500 

Arg Asn Ser Asn Asn Phe Asp Lys Ala Ser Ser Lys Asn Thr Leu Ala 
1*05 1510 1515 1520 

Gin Val Asn Phe Tyr Ser Lys Tyr Tyr Ala Asp Asn His Trp Tyr Leu 
1525 1530 152E 

Gly lie Asp Leu Gly Tyr Gly Lys Phe Gin Ser Asn Leu Lys Thr Asn 
1540 1545 1550 

His Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly Leu Thr Ala 
1555 1560 1565 

Gly Lys Ala Phe Asn Leu Gly Asn Phe Gly lie Thr Pro lie Val Gly 
1570 1575 1580 

Val Arg Tyr Ser Tyr Leu Ser Asn Ala Asn Phe Ala Leu Ala Lys Asp 
1585 1590 1595 1600 

Arg lie Lys Val Asn Pro He Ser Val Lys Thr Ala Phe Ala Gin Val 
1605 1610 1615 

Asp Leu Ser Tyr Thr Tyr His Leu Gly Glu Phe Ser Val Thr Pro He 
1620 1625 1630 

Leu Ser Ala Arg Tyr Asp Thr Asn Gin Gly Ser Gly Lys He Asn Val 
1635 1640 1645 

Asn Gin Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin Gin Tyr Asn 
1650 1655 1660 

Ala Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu He Gly 
1665 1670 1675 1680 

Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr Ala Glu Leu 
1685 1690 ~ 1695 

Lys Leu Ser Phe Ser Phe 
1700 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1848 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 
15 10 15 



4 
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Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin lie Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Lys Asn Gin Ser Leu 
50 55 60 

Gly Ser Ala Leu Pro Asn Gly lie Pro Met He Asp Phe Ser Val Val 
65 7 0 75 80 

Asp Val Asp Lys Arg He Ala Thr Leu Val Asn Pro Gin Tyr Val Val 
85 90 95 

Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
100 105 no 

Leu Asn Gly Asn Met Asn Asn Gly Asn Ala Lvs Ser His Arg Asp Val 
115 120 12b 

Ser Ser Glu Glu Asn Arg Tyr Tyr Thr Val Glu Lys Asn Asn Phe Pro 
13 ° 135 140 

Thr Glu Asn Val Thr Ser Phe Thr Lys Glu Glu Gin Asp Ala Gin Lys 
145 150 155 !60 

Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 
165 170 175 

Val Ala Pro He Glu Ala Ser Thr Ala Asn Asn Asn Lys Gly Glu Tyr 
180 185 190 

Asn Asn Ser Asp Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Thr 
195 200 205 

Gin Phe He Tyr Lys Lys Gly Ser Arg Tyr Gin Leu He Leu Thr Glu 
21 0 215 220 

Lys Asp Lys Gin Gly Asn Leu Leu Arg Asn Trp Asp Val Gly Glv Aso 
225 230 235 * 2 40 

Asn Leu Glu Leu Val Gly Asn Ala Tyr Thr Tyr Gly He Ala Gly Thr 
245 250 255 

Pro Tyr Lys Val Asn His Glu Asn Asn Gly Leu He Gly Phe Gly Asn 
260 265 270 

Ser Lys Glu Glu His Ser Asp Pro Lys Gly He Leu Ser Gin Asp Pro 
275 280 285 

Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser Gly Ser Pro Leu Phe Val 
290 295 300 

Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe Leu Gly Ser Tyr Asn Phe 
305 310 315 " ^ 3 20 

Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin Glu Trp Asn He Tyr Lys 
325 330 3S5 

His Glu Phe Ala Glu Lys He Tyr Gin Gin Tyr Ser Ala Gly Ser Leu 
340 345 350 

He Gly Ser Asn Thr Gin Tyr Thr Trp Gin Ala Thr Gly Ser Thr Ser 
355 360 365 
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Thr lie Thr Gly Gly Gly Glu Pro Leu Ser Val Asp Leu Thr Asp Gly 
370 375 380 

Lys Asp Lys Pro Asn His Gly Lys Ser lie Thr Leu Lys Gly Ser Gly 
385 390 395 400 

Thr Leu Thr Leu Asn Asn His lie Asp Gin Gly Ala Gly Gly Leu Phe 
405 410 415 

Phe Glu Gly Asp Tyr Glu Val Lys Gly Thr Ser Asp Ser Thr Thr Trp 
420 425 430 

Lys Gly Ala Gly Val Ser Val Ala Asp Gly Lys Thr Val Thr Trp Lys 
435 440 445 

Val His Asn Pro Lys Tyr Asp Arg Leu Ala Lys lie Gly Lys Gly Thr 
4 50 4 55 460 

Leu Val Val Glu Gly Lys Gly Lys Asn Glu Gly Leu Leu Lys Val Gly 
465 470 475 480 

Asp Gly Thr Val lie Leu Lys Gin Lys Ala Asp Ala Asn Asn Lys Val 
485 490 495 

Gin Ala Phe Ser Gin Val Gly lie Val Ser Gly Arg Ser Thr Leu Val 
500 505 510 

Leu Asn Asp Asp Lys Gin Val Asp Pro Asn Ser lie Tyr Phe Gly Phe 
515 520 525 

Arg Gly Gly Arg Leu Asp Leu Asn Gly Asn Ser Leu Thr Phe Asp His 
530 535 540 

lie Arg Asn He Asp Asp Gly Ala Arg Val Val Asn His Asn Met Thr 
545 550 555 560 

Asn Thr Ser Asn He Thr He Thr Gly Glu Ser Leu He Thr Asn Pro 
565 570 575 

Asn Thr He Thr Ser Tyr Asn He Glu Ala Gin Asp Asp Asp His Pro 
580 585 590 

Leu Arg He Arg Ser He Pro Tyr Arg Gin Leu Tyr Phe Asn Gin Asp 
595 600 605 

Asn Arg Ser Tyr Tyr Thr Leu Lys Lys Gly Ala Ser Thr Arg Ser Glu 
610 615 620 

Leu Pro Gin Asn Ser Gly Glu Ser Asn Glu Asn Trp Leu Tyr Met Gly 
625 630 635 640 

Arg Thr Ser Asp Ala Ala Lys Arg Asn Val Met Asn His He Asn Asn 
645 650 655 

Glu Arg Met Asn Gly Phe Asn Gly Tyr Phe Gly Glu Glu Glu Thr Lys 
660 665 670 

Ala Thr Gin Asn Gly Lys Leu Asn Val Thr Phe Asn Gly Lys Ser Asp 
675 680 685 

Gin Asn Arg Phe Leu Leu Thr Gly Gly Thr Asn Leu Asn Gly Asp Leu 
690 695 700 

Asn Val Glu Lys Gly Thr Leu Phe Leu Ser Gly Arg Pro Thr Pro His 
705 710 715 720 
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Ala Arg Asp lie Ala Gly He Ser Ser Thr Lys Lys Asp Pro His Phe 

725 730 " 735 

Thr Glu Asn Asn Glu Val Val Val Glu Asp Asp Trp He Asn Arg Asn 
740 745 750 

Phe Lys Ala Thr Thr Met Asn Val Thr Gly Asn Ala Ser Leu Tyr Ser 
755 760 765 

Gly Arg Asn Val Ala Asn He Thr Ser Asn lie Thr Ala Ser Asn Asn 
'70 775 78Q 

Ala Gin Val His He Gly Tyr Lys Thr Gly Asp Thr Val Cys Val Arg 
785 790 795 8 00 

Ser Asp Tyr Thr Gly Tyr Val Thr Cys His Asn Ser Asn Leu Ser Glu 
605 810 815 

Lys Ala Leu Asn Ser Phe Asn Pro Thr Asn Leu Aro Glv Asn Val Asn 

820 8 25 " " 83c . 

Leu Thr Glu Asn Ala Ser Phe Thr Leu Gly Lys Ala Asn Leu Phe Glv 
835 8 4o 845 r 

Thr lie Gin Ser He Gly Thr Ser Gin Val Asn Leu Lys Glu Asn Ser 
850 855 860 

His Trp His Leu Thr Gly Asn Ser Asn Val Asn Gin Leu Asn Leu Thr 
b 870 875 880 

Asn Gly His He His Leu Asn Ala Gin Asn Asp Ala Asn Lys Val Thr 
885 890 895 

Thr Tyr Asn Thr Leu Thr Val Asn Ser Leu Ser Gly Asn Gly Ser Phe 

Tyr Tyr Trp Val Asp Phe Thr Asn Asn Lys Ser Asn Lys Val Val Val 
915 920 925 

Asn Lys Ser Ala Thr Gly Asn Phe Thr Leu Gin Val Ala Asp Lys Thr 

Gly Glu Pro Asn His Asn Glu Leu Thr Leu Phe Asp Ala Ser Asn Ala 
5 9S0 955 geo 

Thr Arg Asn Asn Leu Glu Val Thr Leu Ala Asn Gly Ser Val Asn Ara 
965 970 91 % s 

Gly Ala Trp Lys Tyr Lys Leu Arg Asn Val Asn Gly Arg Tyr Asp Leu 
980 gss 990 

Tyr Asn Pro Glu Val Glu Lys Arg Asn Gin Thr Val Asp Thr Thr Asn 
995 1000 10 05 

I1S T^n Thr Pr ° ASn Iie Gin Ala Ala Pro s er Ala Gin Ser 

1010 1015 1020 

Asn Asn Glu Glu He Ala Arg Val Glu Thr Pro Val Pro Pro Pro Ala 
1025 1030 1035 1040 

Pro Ala Thr Glu Ser Ala He Ala Ser Glu Gin Pro Glu Thr Aro Pro 
1045 1050 10 1 5 

Ala Glu Thr Ala Gin Pro Ala Met Glu Glu Thr Asn Thr Ala Asn Ser 
1° 60 1065 107 0 
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Thr Glu Thr Ala Pro Lys Ser Asp Thr Ala Thr Gin Thr Glu Asn Pro 
1075 1080 1085 

Asn Ser Glu Ser Val Pro Ser Glu Thr Thr Glu Lys Val Ala Glu Asn 
1090 1095 1100 

Pro Pro Gin Glu Asn Glu Thr Val Ala Lys Asn Glu Gin Glu Ala Thr 
1105 1110 1115 1120 

Glu Pro Thr Pro Gin Asn Gly Glu Val Ala Lys Glu Asp Gin Pro Thr 
1125 1130 1135 

Val Glu Ala Asn Thr Gin Thr Asn Glu Ala Thr Gin Ser Glu Gly Lys 
1140 1145 1150 

Thr Glu Glu Thr Gin Thr Ala Glu Thr Lys Ser Glu Pro Thr Glu Ser 
1155 1160 1165 

Val Thr Val Ser Glu Asn Gin Pro Glu Lvs Thr Val Ser Gin Ser Thr 
1170 1175 llSC 

Glu Asp Lys Val Val Val Glu Lys Glu Glu Lys Ala Lys Val Glu Thr 
1185 1190 1195 1200 

Glu Glu Thr Gin Lys Ala Pro Gin Val Thr Ser Lys Glu Pro Pro Lys 
1205 1210 * 1215 

Gin Ala Glu Pro Ala Pro Glu Glu Val Pro Thr Asp Thr Asn Ala Glu 
1220 1225 1230 

Glu Ala Gin Ala Leu Gin Gin Thr Gin Pro Thr Thr Val Ala Ala Ala 
1235 1240 1245 

Glu Thr Thr Ser Pro Asn Ser Lys Pro Ala Glu Glu Thr Gin Gin Pro 
1250 1255 1260 

Ser Glu Lys Thr Asn Ala Glu Pro Val Thr Pro Val Val Ser Glu Asn 
1265 1270 1275 1280 

Thr Ala Thr Gin Pro Thr Glu Thr Glu Glu Thr Ala Lys Val Glu Lys 
1285 1290 * 1295 

Glu Lys Thr Gin Glu Val Pro Gin Val Ala Ser Gin Glu Ser Pro Lys 
1300 1305 1310 

Gin Glu Gin Pro Ala Ala Lys Pro Gin Ala Gin Thr Lys Pro Gin Ala 
1315 1320 1325 

Glu Pro Ala Arg Glu Asn Val Leu Thr Thr Lys Asn Val Gly Glu Pro 
1330 1335 1340 

Gin Pro Gin Ala Gin Pro Gin Thr Gin Ser Thr Ala Val Pro Thr Thr 
1345 1350 1355 1360 

Gly Glu Thr Ala Ala Asn Ser Lys Pro Ala Ala Lys Pro Gin Ala Gin 
1365 1370 * 1375 

Ala Lys Pro Gin Thr Glu Pro Ala Arg Glu Asn Val Ser Thr Val Asn 
1380 1385 1390 

Thr Lys Glu Pro Gin Ser Gin Thr Ser Ala Thr Val Ser Thr Glu Gin 
1395 1400 1405 

Pro Ala Lys Glu Thr Ser Ser Asn Val Glu Gin Pro Ala Pro Glu Asn 
1410 1415 1420 
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Ser lie Asn Thr Gly Ser Ala Thr Thr Met Thr Glu Thr Ala Glu Lys 
1425 14 3° 1435 ii 40 

Ser Asp Lys Pro Gin Met Glu Thr Val Thr Glu Asn Asp Arg Gin Pro 
144 5 1450 2455 

Glu Ala Asn Thr Val Ala Asp Asn Ser Val Ala Asn Asn Ser Glu Ser 
1460 1465 1470 

Ser Glu Ser Lys Ser Arg Arg Arg Arg Ser Val Ser Gin Pro Lys Glu 
1475 1480 1485 

?!L Ala G1U G1U Thr Thr Val Ala Ser Thr Gln Glu Thr Thr Val 
i45, ° 1495 1500 

Asp Asn Ser Val Ser Thr Pro Lys Pro Arg Ser Arg Arg Thr Arg Arg 
5 1510 1515 1S | 0 

Ser Val Gin Thr Asn Ser Tyr Glu Pro Val Glu Leu Pro Thr Glu Asn 
1525 1530 1535 

Ala Glu Asn Ala Glu Asn Val Gin Ser Gly Asn Asn Val Ala Asn Ser 
1540 1545 1550 

Gin Pro Ala Leu Arg Asn Leu Thr Ser Lys Asn Thr Asn Ala Val lie 
1555 1560 1S65 

Ser ?f7o Ala Met Ala Lys A c a c Gln phe val Ala Leu Asn v *i Giy Lys 

1575 1580 

A 5B5 Val H±S ^o„ Ser Gln LeU Glu Met Asn Asn Glu Gly Gin 

1585 1590 1595 ^ 1600 

Tyr Asn Val Trp lie Ser Asn Thr Ser Met Asn Lys Asn Tyr Ser Ser 
1605 1610 1615 

Glu Gln Tyr Arg Arg Phe Ser Ser Lys Ser Thr Gln Thr Gln Leu Gly 
1620 162 5 i 6 3 0 Y 

Trp Asp Gln Thr lie Ser Asn Asn Val Gln Leu Gly Gly Val Phe Thr 
l 635 1640 i 64 5 

itso^ 9 ASn Jlss^ 6 ASP LyS Ala i|60 SSr ^ ASn Thr 

Leu^Ala Gln Val Asn JJe^Tyr Ser Lys Tyr Tyr_Ala Asp Asn His Trp 

Tyr Leu Gly lie Asp Leu Gly Tyr Gly Lys Phe Gln Ser Asn Leu Gln 
1685 lego legs 

Thr Asn Asn Asn Ala Lys Phe Ala Arg His Thr Ala Gln He Gly Leu 
1700 1705 ivio Y 

Thr Ala Gly Lys Ala Phe Asn Leu Gly Asn Phe Ala Val Lys Pro Thr 
1715 1720 1725 

?730 Val ^ It'Is^ 11 ASn Ala A ?P- Phe Ala Ala 



1740 



Gln Asp Arg He Lys Val Asn Pro He Ser Val Lys Thr Ala Phe Ala 
1745 1750 1755 176£) 

Gln Val Asp Leu fer Tyr Thr Tyr His Leu Gly Glu Phe Ser He Thr 
176 5 1770 i 775 
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Pro lie Leu Ser Ala Arg Tyr Asp Ala Asn Gin Gly Asn Gly Lys He 
1780 1785 1790 

Asn Val Ser Val Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin Gin 
1795 1800 1805 

Tyr Asn Ala Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu 
1810 1815 1820 

lie Gly Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr Ala 
1825 1830 1835 1840 

Glu Val Lys Leu Ser Phe Ser Phe 
1845 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Gly Asp Ser Gly Ser Pro Met Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Gly Asp Ser Gly Ser Pro Leu Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

His Thr Tyr Phe Gly lie Asp 
1 5 
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CLAIMS 

1. A recombinant Haemophilus adhesion and penetration 
protein. 

2. A recombinant Haemophilus adhesion and penetration 
protein according to claim 1 which has a sequence 
homologous to that shown in Figure 6 . 

3. A recombinant Haemophilus adhesion and penetration 
protein according to claim 1 which has the sequence 
shown in Figure 6 . 

4. A recombinant nucleic acid encoding an Haemophilus 
adhesion and penetration protein. 

5. The nucleic acid of claim 3 comprising DNA having 
a sequence homologous to that shown in Figure 6 . 

6 . An expression vector comprising transcriptional and 
translational regulatory nucleic acid operably linked 
to nucleic acid encoding an Haemophilus adhesion and 
penetration protein. 

7. A host cell transformed with an expression vector 
comprising a nucleic acid encoding an Haemophilus 
adhesion and penetration protein. 

8. A method of producing an Haemophilus adhesion and 
penetration protein comprising: 

a) culturing a host cell transformed with an 
expressing vector comprising a nucleic acid 
encoding an Haemophilus adhesion and penetration 
protein; and 
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b) expressing said nucleic acid to produce an 
Haemophilus adhesion and penetration protein. 

9. A vaccine comprising a pharmaceutical^ acceptable 
carrier and an Haemophilus adhesion and penetration 

5 protein for prophylactic or therapeutic use in 

generating an immune response. 

10. A vaccine according to claim 8 wherein said 
Haemophilus adhesion and penetration protein has a 
sequence homologous to that shown in Figure 6. 

10 11. A monoclonal antibody capable of binding to an 

Haemophilus adhesion and penetration protein. 

12. A method of treating or preventing Haemophilus 
influenzae infection comprising administering the 
vaccine of claim 9 or 10. 
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30 » 70 90 

i6 TA^ACCAAAAATTACTTAATTAAATAAACATTAT&AAAAAAACTCTATTTCCTCTTAATTTT 



T CA ATACTCGTTT A A CT A GT ATTTTTTAATAC 6AAA AA \ ial. | M K K T y F R L N F I 
-35 - 10 

•ma 130 ISO X70 

taa 310 330 3S0 

GCCCCGATGATTGATTTttItCTAGTCTCAC^^ 

APMIOFSVVSRN GVAALVENQTIV * v a n " * . 

„* 390 410 430 450 

GGATATAC^GA^ 

478 490 510 530 

AAAGATAATTTACATCCTTATGAGGACGATTACCATAATCCACGATTACAT^ 

KDNLHPYEDDYHNPRLHKFVT tAAri uh 

cca 578 590 610 630 

AATATGAATGGCAGTACTTATTCAGATAGAACAAAATATCCAG^ 

esa 670 690 710 

•noi 7S0 770 790 810 

ttgggagg1?atcttcgtaaagcgggagaatatgctc^ttaccgattgcaggctcaaagggggacactgct^ 

LGGOVRKAGEYGPLPIAGSKGOSGSPMF I Y 
pan 850 870 890 

gatgctgaaaaacaaaaatIgttaattaatgggatattacgggaaggcaacccttttgaaggcaaagaaaatgggtttcaat^ 

OAEKQKHLINGI LREGNPFEGKENGFQLVR 



AAATCTTATTTTGATGAAATTTTC6AAAW0ATTTACATACA 

KSYF0EIFER0LHTSLYTRAGNGVYTI5&N 

-iaia 1030 1050 1070 

GATAATGGTCAGGGGTCTATAACTCAGAAATCAGGAATACCATCAWAATTAAAATTACGTTAG^ 

DNGQGSITQKSGIPSEIKlTLANHSLPLKt 

iaoa me 1130 liso 117© 

AAGGATA^A^TCATAATCCTAGATATMCGGACCTAATATTTATTCTCCACGTTTAAAC^ 

KOKVHNPRYDGPHIYSPRLNMGETLTI-Muy 

ii«A 1210 1230 1250 

AAACAAGGATCATTAATCTTCGCATCTGACATTAACCAAGGGGCGGGTGGTC^ 

KQGSLIFASOIMQGAGGLYFEGNFTVSPns 

157A 1290 1310 1330 13S0 

AACCAAAmGGCAAGGAGaGGCATA^TGTAAGTGAAAATAG 

ilM 1390 1410 1430 

aaaattggtaaaggaaSttccacgttcaagccaaagggga^^ 



950 970 990 
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1450 1470 ^490 1510 153^ 

CACCCACACCATCAACGCAACAAACAACCCTTTACTGAAATTCCCTTCCTTAGCGCCAGACCCACTGTTCAATTAAACGATCATAAACAA 
QAOOQGNKQAFSEIGLVSGRGTVQLKDOKQ 

, L 1550 1570 1590 1610 

TTTGATACCGATAAATTTTATTTCGGCTTTCGTGGT^ 

FDTDKFYFGFRGGRLDLNGHSLTFKR1 qTt 

1630 1650 1670 1690 1710 

GACGAGGGGGCAATGATTGTCAACCATAATACAACTCAAGCCGCTAATCTCACTATTACTGGGAACGAAAGCATTGTTCTACCTAA^GA 
OECAMIVMHNTTQAANVTITGNESIVLP H Q 

1730 1750 1770 1790 

^^^^^AATAAACTTGATTACAGAAAAGAAATTGCCTACAACGGTTGGTTTGGCGAAACAGATAAAAATAAACACAATCGGCGATTA 
NMINKLDYRKEIAYNGWFGETDKNKHN 6 R L 

1810 1830 1850 1870 IMA 

AACmATTTATAAACCAACCACAGAAGATCCTACTTTGCTACTTTCAGCTGGTACAAAmAAAAGGCGATATTACCCAAA 
NLIYKPTTEDRTLLLSGGTNLKGDITQTKG 

1910 1930 1950 1970 

AAACTATTTTTCAGCGGTAGACCGACACCGCACGCCTACAATCATTTAAATAAACGTTGCTCAGAAATGGAAGCTATACCACAAGGCCAA 
KLFFSGRPTPHAYNHLNKRWSEMEGIPQGE 

1990 2010 2030 2050 2070 

ATTCTGTGGGATCACGATTGGATCAACCGTACATTTAAAGCTGAAAACTTCCAAATTAAAGGCGGAAGTGCGGTGGTTTCTCGCAATGTT 
IVWOHOWINRTFKAENFQIKGGSAVVSRNV 

2090 2110 2130 2150 

TCTTCAATTGAGGGAAATTGGACAGTCAGCAATAATGCAAATGCCACATTTGGTGTTGTCCCAAATCAACAAAATACCATTTCCACGCGT 
SSI EGNWTVSNNANATFGVVPNQQNTICTR 

217® 2190 2210 2230 2250 

TCAGATTGGACAGGATTAACGACTTGTCAAAAAGTGGATTTAACCGATACAAAAGTTATTAATTCTATACCAAAAACACAAATCAATCGC 
SOWTGLTTCQKVOLTOTKVI NS IPKTQING 

2270 2290 2310 2330 

TCTATTAATTTAACTGATAATGCAACGGCGAATGTTAAAGGTTTAGCAAAACTTAATGGCAATGTCACTTTAACAAATCACAGCCAATTT 
SIN LTDNATAHVKGLAKLNGNVTLTNHSQF 

2350 2370 2390 2410 2430 

ACATTAAGCAACAATGCCACCCAAATAGGCAATATTCGACTTTCCGACAATTCAACTGCAACGGTGGATAATGCAAACTTGAACGGTAAT 
TLSNNATQIGNIRLSDNSTATVDNANLNGN 

2450 2470 2490 2510 

^^^■^^^T"^^^^^*^*^"^"^ ^ A ^^T^ AA TTXTCTXTAAAAAACAGCCATTTTTCGC ACCAAATTCAGGGAGACAAAGCCACAACACTGACCXTC 
VHLTDSAQFSLKNSHFSMQIQGOKGTTVTL 

2530 2550 2570 2590 2610 

GAA^TGCGACTTGGACAATGCCTAGCGATACTACATTGCAGAATTTAACGCTAAATAACAGTACGATCACGTTAAATTCAGCTTATTCA 
EHATWTMP SDTTLQNLTLNNST ITLNSAYS 

2630 2650 2670 2690 

GCTAGCTCAAACAATACGCCACGTCGCCGTTCATTAGAGACGGAAACAACGCCAACATCGGCAGAACATCCTTTCAACACATTGACAGTA 
ASSNNTPRRRSLETETTPTSAEHRFNTLTV 

2710 2730 2750 2770 2790 

AATGGTAAATTGAGTGGGCAAGGCACATTCCAATTTACTTCATCTTTATTTGGCTATAAAACCGATAAATTAAAATTATCCAATGACGCT 
NGKLSGQGTFQFTSSLFGYKSDKLKLSNDA 

2810 2830 28S0 2870 

6AGGGCGATTACATATTATCTCTTCGCAACACAGGCAAAGAACCCGAAACCCTTGAGCAATTAACTTTGGTT GAAAGCAAAGATAATCAA 
HGDYILSVRNTGKEPETLEQLTLVESKONQ 
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9tt0 A 2910 2930 29S0 2970 

CCGTTaSaGATAAGCTCAAATTTAOTAGA^^ 

2990 3010 3030 3050 

1TCCCCTTGCATAACCCAATAAAAGACCAC6AATTGCACAAT6ATTTACTAACAGCAGACCAACCAGAAC6AACATTA6AACCCAAACAA 
F R I H MPIKEQELHNDLVRAEQAERTL.. E AKQ 

3090 3110 3130 3150 

ffTTIUUL^GACTGCTAAAACACAAACAGGTGAGCCAAAACTGCGCTCAAGAAGAGCACCGAGAGCACCGTTTCCTCATACCCTCCCT 
til i jiAAii<iAi A|(TQT6E p KVRSRRAARAAF pOT L P D 

3170 3190 3210 3230 

CAJUGCamAAACGCATTAGAAGCCAAACAAGCTGAACTGACTGCTGAAACACAAAAAAGTAAGGCAAAAACAAAAAAAGTGCGGTCA 
QSILNA LEAKQAELTAETQICSKAKTKKVRS 

3258 3270 3290 3310 3330 

AAAAGA6CA6TGTTTTCTCATCCCCTGCTTGATCAAA6CCTGTTCGCATTA6AAGCCGCACTTGAGGTTATTGAT6CCCCACA6CAATCG 
KR A V FS0PIL0QSLFALEAALEV10APQQS 

3350 3370 3390 3410 

CAAjja^TCCTCTAGCTCAAGAAGAAGCGGAAAAACAACGCAAACAAAAAGACTTGATCAGCCGTTATTCAAATAGTGCGTTATCAGAA 
£ It D RLAQEEAEKQRKQKOLISRYSNSALSE 

3439 3450 3470 3490 3510 

TTATCTGCAACACTAAATACTATGCTTTCTGTTCAAGATGAATTAGATCGTCTTTTTGTAGATCAAGCACAATCTGCCGTGTGGACAAAT 
LSATVNSMLSVQDELDRLFVDQAQSAVWTN 

3S30 3550 3570 3590 

ATCGCACAGGATAAAAGACGCTATGATTCT6AT6CGTTCC6TGCTTATCAGCAGCA6AAAACGAACTTACGTCAAATTCGGGTGCAAAAA 

IAQDKRRYDSDA F RAYQQQKT N I RQI GVQK 

3610 3630 3650 3670 3690 

CCCTTAGCTAATGGACGAATTGGGGCACTTTTCTCGCATA6CC6TTCAGATAATACCTTTGATGAACAGGTTAAAAATCACGC6ACATTA 
ALANGRIGAVFSHSRSOMTFOEQVKHHATL 

3710 3730 3750 3770 

ACGATGATCTCGGGTTTTGCCCAATATCAATG6GCCGATTTACAATTTGGTGTAAACGTGG6AACGGGAATCAGTGCGACTAAAATGGCT 
TMMSGFAQYQWGDLQFGVNVGTGISASKHA 

3790 3810 3830 3850 3870 

GAAGAACAAAGCCGAAAAATTCATCGAAAAGCGATAAATTATGGCGTGAATGCAAGTTATCAGTTCCGTTTAGGGCAATTGGGCATTCAG 
EEQSRKI HRKA I NYGVRASYQFRL GQL GIQ 

3890 3910 3930 3950 

f rTT A TTTTG6ACTTAATCGCTATTTTATTGAACGTGAAAATTATCAATCTGAGGAACTGA6AGT6AAAACGCCTAGCCTTGCATTTAAT 
PYF6VMRYFI ERENYQSEEVRVKTPSLAFN 

3970 3990 4010 4030 4050 

CGCTATAATGCTGGCATTCGAGTTGATTATACATTTACTCCGACAGATAATATCAGCCTTAAGCCTTATTTCTTCGTCAATTATCTTGAT 

RYMACIRVDYT FTPTDNISVKPYFFVNYVD 

4070 4090 4110 4130 

GTTTCAAACGCTAACGTACAAACCACGGTAAATCTCACGGTGTTGCAACAACCATTTGGACGTTATTGGCAAAAAGAAGTGGGATTAAAG 
VSNANVQTTVNUTVLQQPFCRYWQKEVGIK 

4150 417© 4190 4210 4230 

GCAGAAATTTTACATTTCCAAATTTCCGCTTTTATCTCAAAATCTCAAG6TTCACAACT 

AEILHFQISAFI SKSQGSQLGKQQNVGVKL 

4250 4270 4290 4310 

CGCTATCGTTGGTMAAATCAACATAATTTTATCGTTTATTGATAAACAAGGTGW^CAGA^CAGAT^CACCTTTTTTATTCCAATAAT 

G Y R W • 
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200 

STYSDRTKYP 
GTYNDQNKYP 
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GTYNDQNKYP 
GEYNNSDKYP 
— Y KYP 



201 250 

Hap ERVRIGSGRQ F WRNDQ DKGDQVAGAY 

HK368IGA AFVRLGSGSQ FIYKKGDNYS LIL N NH EVGG NNLKLVGDAY 

HK393IGA YFVRLGSGTQ FIYENGTRYE LWL G KEGQKSDAQG YNLKLVGNAY 

HK715IGA AFVRLGSGSQ FIYKKGDNYS LIL N NH EVGG NNLKLVGDAY 

HK61IGA. AFVRLGSGSQ FIYKKGSRYQ LILTEKDKQG NLLRNWDVGG DNLELVGNAY 

Consensus — VR-GSG-Q F V— AY 
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251 300 

Hap HYLTAOJTHN QRGAGNGYSY LOG D VRKAGEYGPL PIAGSKGDSG 

HK368IGA TYGIAGIPYK VNHENNGLIG FGNSKEEHSD PKGILSQDPL TNYAVLGDSG 

HK393IGA TYGIAGIPYE VNHENDGLIG PGOSNNEYIN PKEILSKKPL TNYAVLGDSG 

HK715IGA TYGIAGIPYK VNHENNGLIG FGNSKEQiSD PKGILStfDPL TNYAVLQ3SG 

HK61IGA TYGIAGIPYK VNHENNGLIG FC3JSKEEHSD PKGILSQDPL TNYAVLO^SG 

Consensus -Y — AG G G PL GQSG 

301 350 

Hap SPMFIYDAEK QKWLINGILR EGNPFEGKEN (3X2LVRKSYF D.EIFERDLH 

HK368IGA SPLFVYDREK Q<WLFLGSYD FWAGYN KKSWQ EWNIYKSQET 

HK393IGA SPIFVXDREK GKWLFLGSYD YWAGYN KKSWQ EWNIYKPEFA 

HK715IGA SPLEVYDREK OOTLFLGSYD FWAGYN KKSW3 EWNIYKPEFA 

HK61IGA SPLFVYDREK G<WLFLGSYD FWAGYN KKSWQ EWNIYKHEFA 

Consensus SEzE-YD-EK -KWL — G KS 1 

351 400 

Hap TSLYTRAGNG VYTISGNDNG QGSITQKSGI PSEIKITLAN MSLPLKEKDK 

HK368IGA KDVLNKDSAG SLIGSKTDYS WSSNGKTSTI TGGEK S LNVDLAD. . . 

HK393IGA EKIYBQfYSAG SLIGSKTDYS WSSNGKTSTI TGGEK S LNVDLAD. . . 

HK715IGA KTVLDKDTAG SLTGSNTQYN WNPTGKTSVI SNGSE S LNVDLFD. . . 

HK61IGA EKIYQQYSAG SLTGSNTQYT WQATGSTSTI TQQGE P LSVDLTD. . . 

Consensus G S S-I L 

401 450 

Hap VHNPRYDGPN IYSPRIUNGE TLYFMXKQG SLIFASDINQ GAGGLYFBGN 

HK368IGA GKD KPNHGK SVTFEG. .SG TLTLNNNIDQ GAGGLFFEGD 

HK393IGA Q<D KPNHGK SVTFEG. .SG TLTLNNNIDQ GAGGLFFEGD 

HK715IGA SSQP TDSKKNNHGK SVTLRG. .SG TLTLNNNIDQ GAGGLFFEGD 

HK61IGA GKD KPNHGK SITLKG. .SG TLTLNNHIDQ GAGGLFFEGD 

Consensus N-G G -L I-Q GAGGL-FEG- 

451 500 

Hap FTVSPNSNQ. TWQGAGIHVS ENSTVTWKVN GVEHDRLSKI GKGTLHVQAK 

HK368IGA YEVKGTSDNT TWKGAGVSVA EX3KTVTWKVH NPQYDRLAKI GKGTLIVBGT 

Hk393IGA YEVKGTSDNT TWKGAGVSVA EGKTVTWKVH NPQYDRLAKI GKGTLIVEGT 

HK715IGA YEVKGTSDST TWKGAGVSVA DGKTVTWKVH NPKSDRLAKI GKGTLIVBGK 

HK61XA YEVKGTSDST TWKGAGVSVA DGKTVTWKVH NPKYDRLAKI O^GTLWEGK 

Consensus — V S TVf-GAG — V TVTWKV DRL-KI GKGTL-V 
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NLITDPNNVS IYYVKPLEDD NPYAIRQIKY GYQLYFNEEN RTYYALKKDA 
SLITDPNTIT PYNIDAPDED NPYAFRRIKD GGQLYLNLEN YTYYALRKGA. 
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1001 1050 

Hap PTSAEHRFNT LTVNGKLSGQ GTFQFTSSIF GYKSDKIKLS NDABGDYILS 

HK368IGA YNT LTVNS.LSGN GSFYYLTDLS NKQGDKVWT KSATGNFTLQ 

HK393IGA YNT LTVNS.LSGN GSFYYLTDLS NKQGDKVWT KSATGNFTLQ 

HK715IGA YNT LTVNS.LSGN GSFYYLTDLS NKQGDKVWT KSATGNFTLQ 

HK61IGA YNT LTVNS.LSGN GSFYYWVDFT NNKSNKVWN KSATGNFTLQ 

Consensus NT LTVN — LSG- G-F K A-G L- 

1051 1100 

Hap VRNTGKEPET LEQLTLVESK DNQPLSDKIX FTLENDHVDA GALRYKLVKN 

HK368IGA VADKK3SPNH .NELTLEDAS KAQR. .DHIN VSLVGNTVDL GAWKYKIi^NV 

HK393IGA VADKTGEPNH .NELHTOAS KAQR. .DHIN VSLVGNTVDL GAWKYKLRNV 

HK715IGA VADKTSPTK .NELTLEDAS NATR. .NO VSLVGNTVDL GAWKYKIPNV 

HK61IGA VADKTCSPNH .NELTLEDAS NATR. .NNLE VTLANGSVDR GAWKYKLRNV 

Consensus V EP LTL L L VD- GA — YKL 

1101 1150 

Hap DGEFRLHNPI KEQEUflOLV 

HK368IGA NGRYDLYNP. .EVEKRNQTV DTTNITIPNN IQADVPSVPS NNEEIAKVDE 

HK393IGA NGRYDLYNP. .EVEKRNQTV DTTNITIPNN IQADVPSVPS NNEEIAKVDE 

HK715IGA NGRYDLYNP. .EVEKRNQTV DTTNITIPNN IQADVPSVPS NNEEIAKV.E 

HK61IGA NGRYDLYNP. .EVEKRNQTV DTTNITTPND IQADAPSAQS NNEEIAKV.E 

Consensus -G L-NP E-E — N— V 

1151 1200 

Hap 

HK368IGA AFVPPPAPAT 

HK393IC3V APVPPPAPAT ] m [ [[/////.. 

HK715IGA TPVPPPAPAT ] ][[[ 

HK61IGA TPVPPPAPAT ESAIASEQPE TRPAETAQPA MEETNTANST ETAPKSDTAT 

Consensus 
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Hap RAEQAERTLE AKQVEPT 

HK368IGA PSETTETVAE NSKQESKTVE KNEQDATETT AQNREVAKEA 

HK393IGA PSETTETVAE NSKQESKTVE KNEQDATETT AQNREVAKEA 

HK715IGA PSETTETVAE NSKQESKTVE KNEQDATETT AQNGEVAEEA 

HK61IGA QTENPNSESV PSETTEKVAE NPPQEHETVA KNEQEATEPT PQNGEVAKED 

Consensus -Q- — t T — 
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